Closed WaxCylinderRevival closed 7 years ago
@WaxCylinderRevival sorry for missing several of these! Looks like they're all in main divs, not frus:attachments, which may be a pattern you encounter going forward. Could I check that in the later Q3 volumes or would you prefer me to stay out of that batch while you're working on it?
@vak2ve, no worries! I'll take care of the Q3 volumes.
If you wouldn't mind checking the Q4 batch, that would be great!
date/@
to supplied date
and frus:doc-dateTime-min
| frus:doc-dateTime-max
:
xmlns:xi="http://www.w3.org/2001/XInclude"
<classDecl>
<xi:include href="../shared/frus-dates.xml" xpointer="frus-dates"/>
</classDecl>
date
:
<date notBefore="1920-03-20T00:00:00+08:00" notAfter="1920-03-25T12:30:00-05:00" ana="#date_undated-inferred-from-document-content"><hi rend="italic">undated</hi></date>
date
:
<date notBefore="1920-04-17T18:00:00-04:00" notAfter="1920-04-20T23:59:59-04:00" ana="#date_undated-inferred-from-document-content-and-sibling-dates"><hi rend="italic">undated</hi></date>
date
:
<date notBefore="1920-05-24T17:00:00-04:00" notAfter="1920-05-27T20:00:00-04:00" ana="#date_undated-inferred-from-document-content-and-sibling-dates"><hi rend="italic">undated</hi></date>
date
:
<date notBefore="1920-07-29T14:00:00-05:45" notAfter="1929-07-30T00:00:00-04:00" when="1920-07-29" ana="#date_undated-inferred-from-document-content"><hi rend="italic">undated</hi></date>
p
to closer/dateline
:
<closer>
<dateline>
<placeName>
<hi rend="smallcaps">Washington</hi>
</placeName>, <date when="1920-01-12">
<hi rend="italic">January 12, 1920</hi>
</date>.</dateline>
</closer>
On it! Is there a strategy or XPath you use to catch the ones I missed or do you go doc by doc?
On Fri, Oct 6, 2017 at 3:22 PM, Amanda Ross notifications@github.com wrote:
@vak2ve https://github.com/vak2ve, no worries! I'll take care of the Q3 volumes.
If you wouldn't mind checking the Q4 batch, that would be great!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WaxCylinderRevival/frus-dates-project/issues/1008#issuecomment-334847184, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdG6caFTRCKJA5tGInFE6Tmtjgri9cwks5spn3zgaJpZM4Pw5X7 .
@vak2ve, I use an XQuery script to identify potential candidates and then evaluate the flagged docs. I might be able to borrow some of the regex to give you an XPath that might help. Let me see...
To find dates in postscript
of historical documents without date
:
//div[attribute::type='document'][not(attribute::subtype='editorial-note')][not(descendant::date)]//postscript[matches(.,
'\d{1,2}[(st)(nd)(rd)(th)]*\s+(January|February|March|April|May|June|July|August|September|October|November|December),*\s+\d{4}|((January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2}[(st)(nd)(rd)(th)]*,\s+\d{4})')]
[N.B. Edited to add qualifiers above.]
@vak2ve
To find dates in last paragraphs of historical documents without date
:
//div[attribute::type='document'][not(attribute::subtype='editorial-note')][not(descendant::date)]//p[last()][matches(.,
'\d{1,2}[(st)(nd)(rd)(th)]*\s+(January|February|March|April|May|June|July|August|September|October|November|December),*\s+\d{4}|((January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2}[(st)(nd)(rd)(th)]*,\s+\d{4})')]
[N.B. Edited to add not to descendant::date
]
@vak2ve
@vak2ve, apologies for the many edits, but I think these two XPaths are now qualified enough to be helpful when used via the XPath/XQuery Builder in oxygenXML.
@WaxCylinderRevival thank you so much--that second one in particular will be really helpful. Postscripts and frus:attachments are usually pretty straightforward but it's those last paragraphs of random docs that slip by me!
@vak2ve, if you'd like to experiment with these regex in frus:attachment
, you may wish to try:
To find date candidates in last paragraphs of attachments without date
:
//div[attribute::type='document'][not(attribute::subtype='editorial-note')]//*[local-name()='attachment'][not(descendant::date)]//p[last()][matches(.,
'\d{1,2}[(st)(nd)(rd)(th)]*\s+(January|February|March|April|May|June|July|August|September|October|November|December),*\s+\d{4}|((January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2}[(st)(nd)(rd)(th)]*,\s+\d{4})')]
To find date candidates in postscripts of attachments without date
:
//div[attribute::type='document'][not(attribute::subtype='editorial-note')]//*[local-name()='attachment'][not(descendant::date)]//postscript[matches(.,
'\d{1,2}[(st)(nd)(rd)(th)]*\s+(January|February|March|April|May|June|July|August|September|October|November|December),*\s+\d{4}|((January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2}[(st)(nd)(rd)(th)]*,\s+\d{4})')]
@vak2ve, I added a page for useful XPaths on the Wiki: https://github.com/WaxCylinderRevival/frus-dates-project/wiki/Useful-XPaths
This is a fantastic resource! Thank you so much for putting it together--I'll put these XPaths into practice immediately, and double-check the other Q4 volumes with them too.
Feel free to add to the page, if you have XPath tools you use!
p
and adddateline
to newcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
todateline
in existingcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
and adddateline
to newcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
and adddateline
to newcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
dateline
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
dateline
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
and adddateline
to newcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
and adddateline
to existingcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max
p
and adddateline
to newcloser
:@frus:doc-dateTime-min
and@frus:doc-dateTime-max