Closed WaxCylinderRevival closed 6 years ago
@joewiz, please read above for the start of a proposal for adding default @frus:doc-dateTime-min
and @frus:doc-dateTime-max
values for non-document sections of FRUS
Corrections, suggestions, and questions welcome
I will list other div[@type="section"]/@xml:id
in the comments below for discussion
Sampling of additional div[@type="section"]/@xml:id
found:
div type="section"
xml:id="abouttheseries"
xml:id="AboutTheSeries"
xml:id="actionsstatement"
xml:id="appendix"
xml:id="appendix-1"
... xml:id="appendix-12"
xml:id="circulars"
xml:id="citations"
xml:id="correspondence-1"
xml:id="correspondence-2"
xml:id="intro"
xml:id="introductory"
xml:id="photographs-to'
xml:id="photos"
xml:id="sources"
xml:id="translation-of-the-memorandum"
xml:id="united-states"
This is great! Thanks for putting it together. Just a couple of thoughts:
You proposed using div/@ana="#date_undated-temporarily-inferred-from-volume-rules"
- but do you think "temporarily-" is the right term here? It implies that we'll be doing a future review of the dates applied to these non-document divs. Is that what you intend?
In the search interface, how should we indicate the date judgements when these appear as results? Something like: "Date range [or "Inferred date range"?]: {date-min}-{date-max}" and "Date methodology": "Inferred using volume rules."
I hadn't thought of using @xml:id
as a clue for determining which date rules to apply, but this is a good idea if we can come up with some general rules. We have some frequently used @xml:id
values for common sections headings, but by convention we allow variation for sections that aren't common and/or need unique IDs. Here is a query that produces a comprehensive list, for our reference (must... resist... urge... to normalize...):
xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
array {
collection("/db/apps/frus/volumes")//tei:div
[@type eq "section"]
/@xml:id
=> distinct-values()
=> sort()
}
The results:
[
"AbouttheSeries",
"Contents",
"Index",
"Notes",
"Preface",
"Published",
"Shorttitles",
"Summary",
"Unpublished",
"Volumes",
"about",
"about-this-preview-edition",
"aboutseries",
"abouttheseries",
"abtseries",
"acknowledge",
"actionssatement",
"actionsstatement",
"actionstatement",
"address-of-the-president",
"annual",
"app1map1",
"app1map2",
"app1map3",
"app1map4",
"app2map1",
"app2map2",
"app2map3",
"app2map4",
"appendix",
"appendix-1",
"appendix-10",
"appendix-11",
"appendix-12",
"appendix-2",
"appendix-3",
"appendix-4",
"appendix-5",
"appendix-6",
"appendix-7",
"appendix-8",
"appendix-9",
"appendix1",
"appendix2",
"appendix_a",
"appendix_b",
"charts",
"circulars",
"citations",
"correspondence-1",
"correspondence-2",
"covert",
"delegation",
"delegations",
"directory",
"documents",
"editorial",
"errata",
"front-matter",
"guide",
"historian",
"illustrations",
"index",
"index-persons",
"index-subjects",
"intro",
"intro1",
"intro2",
"intro3",
"intro4",
"intro5",
"intro6",
"intro7",
"intro8",
"introduction",
"introductory",
"list-of-illustrations",
"map",
"map-panama",
"maps",
"message-of-the-president",
"message-of-the-president-1",
"message-of-the-president-2",
"messages-of-the-president",
"messages-of-the-president-1",
"messages-of-the-president-2",
"note",
"notes",
"papers",
"papers-countries",
"papers-topics",
"persons",
"persons-mentioned",
"photographs",
"photographs-toc",
"photos",
"preface",
"prefatory-note",
"pressrelease",
"san-Francisco-earthquake",
"sec-10thPlenary-Oct2",
"sec-11thMeeting-Oct2",
"sec-12thPlenary-Oct2",
"sec-13thPlenary-Oct2",
"sec-14thMeeting-Oct3",
"sec-1stMeeting-July11",
"sec-1stMeeting-July12",
"sec-1stMeeting-Oct20",
"sec-1stMeeting-Sept28",
"sec-1stPlenary-Dec4",
"sec-1stPlenary-Sept28",
"sec-1stRestTripartite-Dec4",
"sec-1stTripartite-Dec4",
"sec-1stTripartite-July10",
"sec-2ndMeeting-July14",
"sec-2ndMeeting-Oct1",
"sec-2ndMeeting-Oct21",
"sec-2ndMeeting2-Oct21",
"sec-2ndPlenary-Dec5",
"sec-2ndPlenary-Sept28",
"sec-2ndRestrictedTripartite-Dec7",
"sec-2ndTripartite-Dec5",
"sec-2ndTripartite-July11",
"sec-3rdMeeting-Oct1",
"sec-3rdMeeting-Oct22",
"sec-3rdPlenary-Dec6",
"sec-3rdPlenary-Sept29",
"sec-3rdTripartite-Dec6",
"sec-3rdTripartite-July13",
"sec-4thMeeting-Oct2",
"sec-4thPlenary-Dec7",
"sec-4thPlenary-Sept29",
"sec-4thTripartite-Dec6",
"sec-4thTripartite-July13",
"sec-5thMeeting-Oct3",
"sec-5thPlenary-Sept30",
"sec-5thTripartite-Dec7",
"sec-5thTripartite-July14",
"sec-5thTripartite2-Dec7",
"sec-6thPlenary-Dec7",
"sec-6thPlenary-Sept30",
"sec-7thPlenary-Oct1",
"sec-8thPlenary-Oct1",
"sec-9thPlenary-Oct2",
"sec-BermudaConf-Dec4-8",
"sec-DE-Sept29",
"sec-DEM-Oct3",
"sec-DEMMeeting-Oct23",
"sec-DEMeeting-Dec6-7",
"sec-DEMeeting-Oct20",
"sec-DEMeeting-Oct21",
"sec-DEMeeting-Oct23",
"sec-DM-Oct3",
"sec-DMFMeeting-Sept29",
"sec-DMMeeting-Oct20",
"sec-DMMeeting-Oct22",
"sec-DTMeeting-Sept30",
"sec-EBMeeting-Dec7",
"sec-ECDinnerMeeting-Dec5",
"sec-ECMeeting-Dec4",
"sec-ECMeeting-Dec5",
"sec-ELMeeting-Dec5",
"sec-FMeeting-Oct21",
"sec-Feb-14-mtg3",
"sec-Feb13",
"sec-Feb13-mtg1",
"sec-Feb14",
"sec-Feb14-mtg1",
"sec-Feb14-mtg2",
"sec-Feb15",
"sec-Feb16",
"sec-Feb16-mtg1",
"sec-Feb17",
"sec-Feb17-mtg1",
"sec-Feb17-mtg2",
"sec-Feb18",
"sec-Feb18-mtg1",
"sec-Feb18-mtg2",
"sec-Feb18-mtg3",
"sec-Feb18-mtg4",
"sec-Feb19",
"sec-Feb19-mtg1",
"sec-Feb19-mtg2",
"sec-Feb19-mtg3",
"sec-Feb20",
"sec-Feb20-mtg1",
"sec-Feb20-mtg2",
"sec-Feb20-mtg3",
"sec-Feb21",
"sec-Feb21-mtg1",
"sec-Feb21-mtg2",
"sec-Feb21-mtg3",
"sec-Feb21-mtg4",
"sec-Feb21-mtg5",
"sec-Feb21-mtg6",
"sec-Feb21-mtg7",
"sec-Feb21-mtg8",
"sec-Feb22",
"sec-Feb22-mtg1",
"sec-Feb22-mtg2",
"sec-Feb22-mtg3",
"sec-Feb22-mtg4",
"sec-Feb23",
"sec-Feb23-mtg1",
"sec-Feb23-mtg2",
"sec-Feb23-mtg3",
"sec-Feb23-mtg4",
"sec-Feb24",
"sec-Feb24-mtg1",
"sec-Feb25",
"sec-Feb25-mtg1",
"sec-Feb25-mtg2",
"sec-Feb26",
"sec-Feb26-mtg1",
"sec-Feb26-mtg2",
"sec-Feb26-mtg3",
"sec-MLMeeting-Dec4",
"sec-MeetingAssociatedStates-July13",
"sec-NAMeeting-Oct22",
"sec-SigningCeremonies-Oct23",
"sec-SigningCeremony-Oct3",
"sec-TripartiteFM-Dec4",
"sec-TripartiteMeeting-July11",
"sec-TripartiteWorkingGp-Dec5",
"section",
"shorttitles",
"source",
"sources",
"subjects",
"subseriesvols",
"summary",
"symbols",
"terms",
"toc",
"toc-countries",
"toc-papers",
"toc-topics",
"topical",
"translation-of-the-memorandum",
"treaties",
"united-states",
"volumes",
"volumesummary"
]
I wasn't sure if we needed to indicate that these dates didn't have strict human review, but I'm open to changing the div/@ana
to "#date_undated-inferred-from-volume-rules"
, if you think it best.
I do think "Inferred Date Range: {date-min}-{date-max}" and "Date methodology": "Inferred using volume rules." could potentially work across documents and sections, etc. (as single dates have an inferred date range, for example).
Ha, I had run a similar query for distinct values and was trying to resist normalization (but the "abouttheseries" and "AbouttheSeries" taunts, @joewiz) . I do think, though, there is a case to be made for adding a @subtype
to div[attribute::type eq "section"]
, grouping Into logical categories based on current @xml:id
(subtype="appendix","event", etc.), and then using the subtypes to determine the date rules to apply.
See commits under this pull request: https://github.com/HistoryAtState/frus/pull/197
div/@subtype | Frequency (as of 2018-05-08) |
---|---|
about-frus-series | 27 |
acknowledgements | 2 |
additional-volumes | 37 |
appendix | 14 |
chapter-introduction | 8 |
editorial-note | 8301 |
editorial-policies | 4 |
errata | 38 |
errata_document-numbering-error | 2 |
event | 38 |
graphic-materials | 10 |
historical-document | 276062 |
index | 1130 |
maps | 1 |
notes | 45 |
preface | 378 |
press-release | 44 |
referral | 979 |
related-materials | 2 |
section | 2 |
sources | 234 |
subsection | 85 |
table-of-contents | 444 |
undetermined | 5 |
volume-summary | 20 |
To deliver more useful search results, we will add
@frus:doc-dateTime-min
and@frus:doc-dateTime-max
todiv
s previously considered non-datable.These datable
div
s include:div type="chapter"
div type="compilation"
div type="section"
xml:id="errata"
xml:id="index"
xml:id="papers"
xml:id="persons"
xml:id="preface"
xml:id="sources"
xml:id="terms"
xml:id="toc-topics"
div type="subchapter"
div type="toc"
Proposed Steps:
1. Update volume coverage dates
1.a. [x] Run .xq query to determine document dates outside of current volumes dates in bibliography
1.b. [x] Update
coverage
values in bibliography1.c. [x] Add
@type="publication-date"
toteiHeader/publicationStmt/date
1.d. Long-term TODO: Incorporate bibliography information into
teiHeader
or other appropriate place for volume2. Add volume coverage dates to appropriate
div
s2.a. Transform volume
coverage/@from
todiv/@frus:doc-dateTime-min
andcoverage/@to
todiv/@frus:doc-dateTime-max
and adddiv/@ana="#date_undated-temporarily-inferred-from-volume-rules"
for the following:Front Matter
div type="section" xml:id="errata"
div type="compilation"
div type="section"
xml:id="errata"
xml:id="index"
xml:id="papers"
xml:id="persons"
xml:id="preface"
xml:id="sources"
xml:id="terms"
xml:id="toc-topics"
div type="toc"
Back Matter
...
2.b. Transform the earliest
@frus:doc-dateTime-min
of descendantdiv[subtype="historical-document]"
and the latest@frus:doc-dateTime-max
of descendantdiv[subtype="historical-document]"
and adddiv/@ana="#date_undated-temporarily-inferred-from-volume-rules"
for the following:Body
div type="chapter"
div type="subchapter"
2.c. Identify documents dates and transform to
@frus:doc-dateTime-min
and@frus:doc-dateTime-max
and adddiv/@ana="#date_undated-temporarily-inferred-from-volume-rules"
for the following:div type="section"
xml:id="address-of-the-president"
xml:id="message-of-the-president"
xml:id="pressrelease"
2.d.
@frus:doc-dateTime-min
and@frus:doc-dateTime-max
may be update with dates more reflective of the content of thediv