Closed mdlincoln closed 4 years ago
@mdlincoln Found it--the Introduction to 1831 is something we withheld from the collation process because it's not present in the other editions. So I'd tucked it away in an include file in my XML and didn't expand it here. I did expand it for the HTML that the team annotated, and we do need to account for it in the Variorum Viewer, so it's good we caught this now!
I suppose the 1831 introduction is one giant location of variance--it's simply there in 1831 and not there at any point earlier.
@mdlincoln Okay! The 1831 file now properly includes its introduction with this commit https://github.com/FrankensteinVariorum/fv-data/commit/1fbb76def2b5f30bb052339ffa383f83dfb38122 . Let me know if you run into any other snags!
@ebeshero ok, will check it now
@ebeshero Ah, I'm sorry to return with even more problems! A filter I'd put on how I was mapping the annotations was masking some problems that I should have caught before asking you to regenerate.
I'm now seeing annotations team has annotations that start all the way on the title page before the introduction, for example: https://hyp.is/p2gtHJRUEemef2cqr_cA0Q/ebeshero.github.io/Pittsburgh_Frankenstein/Frankenstein_1831.html (reminder, you'll need to be logged into the FV annotation account to see this)
Can you make sure all of that front material also makes it in to the xml with IDs?
Ah! I had forgotten I’d given them everything including title pages in the HTML—sorry for rushing through this. I can output the title pages too.
Elisa
Sent from my iPhone
On Nov 7, 2019, at 11:22 AM, Matthew Lincoln notifications@github.com wrote:
@ebeshero Ah, I'm sorry to return with even more problems! A filter I'd put on how I was mapping the annotations was masking some problems that I should have caught before asking you to regenerate.
I'm now seeing annotations team has annotations that start all the way on the title page before the introduction, for example: https://hyp.is/p2gtHJRUEemef2cqr_cA0Q/ebeshero.github.io/Pittsburgh_Frankenstein/Frankenstein_1831.html (reminder, you'll need to be logged into the FV annotation account to see this)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Apparently I output ALL the includes for the HTML... Let me get to this in about an hour! I'm in a meeting.
Elisa
On Thu, Nov 7, 2019 at 11:58 AM Elisa Beshero-Bondar < notifications@github.com> wrote:
Ah! I had forgotten I’d given them everything including title pages in the HTML—sorry for rushing through this. I can output the title pages too.
Elisa
Sent from my iPhone
On Nov 7, 2019, at 11:22 AM, Matthew Lincoln notifications@github.com wrote:
@ebeshero Ah, I'm sorry to return with even more problems! A filter I'd put on how I was mapping the annotations was masking some problems that I should have caught before asking you to regenerate.
I'm now seeing annotations team has annotations that start all the way on the title page before the introduction, for example: https://hyp.is/p2gtHJRUEemef2cqr_cA0Q/ebeshero.github.io/Pittsburgh_Frankenstein/Frankenstein_1831.html (reminder, you'll need to be logged into the FV annotation account to see this)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FrankensteinVariorum/fv-data/issues/11?email_source=notifications&email_token=AA6UDNQMC4Z3KT4UOUIT6FDQSRCMTA5CNFSM4HTXW2RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNC2RY#issuecomment-551169351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6UDNQDKRB4GATBSHHGWZLQSRCMTANCNFSM4HTXW2RA .
-- Elisa Beshero-Bondar, PhD Director, Center for the Digital Text | Associate Professor of English University of Pittsburgh at Greensburg | Humanities Division 150 Finoli Drive Greensburg, PA 15601 USA E-mail: ebb8@pitt.edu ebb8@pitt.edu Development site: http://newtfire.org
@mdlincoln Sorry about the delay! With this commit, I believe I've included the entire frontmatter and backmatter (titlepages, etc) that the Annotations team worked with. https://github.com/FrankensteinVariorum/fv-data/commit/c65d00104efe86671458126036085908a0e2540e Now, let's see if this works!
OK, I think we're pretty close. My spot checks through teh 1831 xml mapping json looks like p
and head
elements are lining up alright, but please take a close look at the examples, such as:
"target": {
"source": "https://frankensteinvariorum.github.io/fv-collation/Frankenstein_1831.html",
"type": "Text",
"selector": [
{
"type": "TextQuoteSelector",
"prefix": " \n \n \n \n by ",
"exact": "the author of THE LAST MAN, PERKIN WARBECK &c. &c.",
"suffix": "\n \n \n \n revi"
},
{
"type": "RangeSelector",
"startSelector": {
"type": "XPathSelector",
"value": "//*[@xml:id='frontmatter1_head6']"
},
"endSelector": {
"type": "XPathSelector",
"value": "//*[@xml:id='frontmatter1_head6']"
}
}
]
},
"diagnostic": {
"note": "not for open annotation consumption",
"html": {
"start": 6,
"end": "/h3[6]"
},
"xml_text_content": "<head xmlns=\"http://www.tei-c.org/ns/1.0\" xmlns:xi=\"http://www.w3.org/2001/XInclude\" xml:id=\"frontmatter1_head6\">\n <hi rend=\"smallcaps\" xml:id=\"frontmatter1_head6_hi1\">BY THE AUTHOR OF</hi> THE LAST MAN, PERKIN WARBECK &c. &c.</head>\n \n "
}
The TextQuoteSelector
is what hypothesis gave us, the RangeSelector
is the new xmlid I've tried to map. The diagnostic
object shows the original html locators and the content of the selected XML element, just so it's easier to to check how the matching went.
1818 p
elements are fine, but for some reason there's an unexpected disjoint in the h3
numbering - I wonder if it's possible you chnged the underlying HTML after they made an annotation? Anyway, it's just a handful of mismatched ones.
@raffazizzi much of the other parts of the annotation JSON are mocked data, so we can continue to update the template until it looks good. Do you think this is in a good enough state now for you to work on displaying them in the react app? Also do you still think you will have bandwidth to work on it before the end of November, as discussed in the call? Let us know if that has changed.
@raffazizzi @ebeshero checking in - can you please confirm if we're in good shape to get an annotations component set up on the react site?
@mdlincoln as I mentioned at the meeting yesterday, thank you so much for this work! It's looking good, but I've just noticed an issue with the XPath references past the first chunk that, unlike the others, has a simple structure with @xml:id
s like preface1_p1
.
The XPaths targeting chunk 2, for example, look like this one: //*[@xml:id='novel1_letter1_p1']
, but the actual @xml:id
s take into acccount the full parent chain, so they look like this: novel1_letter1_div1_p1
or even like this novel1_letter1_div1_ab1_hi1
.
Is this something that would be easy to fix?
@raffazizzi I'll take a look today and report back
@raffazizzi so the XML ids that @ebeshero generated for me in https://github.com/FrankensteinVariorum/fv-data/tree/master/hypothesis/migration/xml-ids don't include the full chain like novel1_letter1_div1_p1
- I think this is a mismatch between the IDs in those versions Elisa made for me vs. the versions you're using in the viewer.
@ebeshero if you can make sure the IDs are consistent between the full files I'm looking at and the ones Raff is working from, this should be an easy fix. Thoughts?
@ebeshero for reference these are the files the application loads: https://github.com/FrankensteinVariorum/fv-data/tree/master/variorum-chunks
Also, would we like to try and map the tunneling annotations onto MS and Thomas? I'll need those files with full xml-ids added to https://github.com/FrankensteinVariorum/fv-data/tree/master/hypothesis/migration/xml-ids
@raffazizzi @mdlincoln Sorry--I've been in class and a noon meeting, and have class coming up again at 3pm! So I'll take a look at the code more closely later this evening. But for the moment, I'm wondering (out loud here) why these ids are different, as I'm generating both sets of them. I'm sure it won't take me long to figure out what's missing in the set I was generating just now. Here is a guess though: I'm worried that the discrepancy is to do with differences in the XML structure of the output editions for the Variorum, vs. the simpler files that serve as the basis of the separate HTML editions that the annotations team annotated.
On the other hand, reviewing this thread, it just looks like I've left out some basic stuff going all the way up the tree--in which case it should be super easy for me to fix. Fingers crossed it's the latter. As I understood it, the xml:ids for these distinct editions were just supposed to identify the XPath locations of elements in the files the annotations team worked on, so I wasn't thinking at the time about making these be identical to those we're using in the Variorum viewer. I guess they certainly should be the same across all the editions. Sorry about any confusion on this--I bet I can sort it out this evening.
@ebeshero great - and it won't cause any changes in my code, so don't worry about rushing. I can continue my other work without this blocking it.
@raffazizzi questions about the output format for annotations:
target.source
value should be the URL of the document that the annotation points to - right now, it's just http://frankensteinvariorum.library.cmu.edu/viewer/viewer/
since the different witnesses aren't on different pages. Is that ok? Thoughts?also note: for the "tunneled" annotations, I'll only provide the RangeSelector
not the TextQuoteSelector
since the quoted text only comes from the witness it was originally attached to.
I've pushed revised results up (now including a json file for 1823 annotations "tunneled" from the others)
will update again once we get refreshed files from elisa
@raffazizzi @mdlincoln Yikes. I was mystified by my XSLT b/c I was duplicating the same location flags I used to generate the full-flat XML files we send on to collateX. And I saw this commit from July 2018 in which I changed that very XSLT (that flattens these files and produces the xml:ids that we have in the viewer): https://github.com/FrankensteinVariorum/fv-collation/commit/8aafed555910de37918010495098aa3496a6c21a#diff-83ea56ef15a44bcb7d9bc667b19c300a
I must have been thinking in July 2018 to remove the XPath levels because they're redundant or something. And I seem not to have followed through with it (I obviously didn't run that XSLT after making the change or we wouldn't have the Variorum collated XML with the ids we have now. The scary thing is I don't remember what I had in mind two years ago! Maybe I was just experimenting with the file. (I wish I could remember!) Anyway, since we are relying on that XPath information now in all our xml:ids, I'm putting it back as it was before, and regenerating the XML files. I'm 99.99% positive everything will match up now, especially since I've figured out why the ids turned out differently!
@raffazizzi @mdlincoln I think with this commit I've repaired our xml:ids on the hypothes.is migration XML edition files, so they match up with the ids on our Variorum collation files: https://github.com/FrankensteinVariorum/fv-data/commit/6dfb0e1e5be070f00a75263fe5107283f9040dfd
Thanks @ebeshero, will re-run my scripts this morning and push the refreshed results.
I've pushed updated annotations json, now also including the Thomas annotations.
@raffazizzi n.b. That I've migrated the hypothesis "tags" according to the examples shown in the W3C guide - they're siblings to the annotation comment in the body
list, typed with "purpose": "tagging"
e.g.
"body": [
{
"type": "TextualBody",
"purpose": "tagging",
"value": "romance"
},
{
"type": "TextualBody",
"purpose": "tagging",
"value": "imagination"
},
{
"type": "TextualBody",
"purpose": "tagging",
"value": "jk"
},
{
"type": "TextualBody",
"value": "Like Robert Walton's love for poetry, Henry Clerval's love for books of chivalry and romance makes him sociable and open to domestic affections, unlike Victor. Victor will later regret that he did not have Henry's or Victor's orientation to languages and poetry at the most critical moments of his life.",
"creator": "https://hypothes.is/users/frankensteinvariorum",
"modified": "2019-10-04T17:16:51.923291+00:00",
"purpose": "commenting"
}
]
@mdlincoln and @ebeshero this looks good and everything seems to work now. Thanks both.
I renamed the file with Thomas annotations to match the internal ID the app has been using (that's Thomas
instead of Thom
).
noted! I've adjusted the generation script to account for that exception.
This script will
Supersedes https://github.com/PghFrankenstein/fv-postCollation/issues/9 Supersedes https://github.com/PghFrankenstein/fv-postCollation/issues/3 Supersedes https://github.com/PghFrankenstein/fv-postCollation/issues/2