Digital-Humanities-Quarterly / dhq-journal

DHQ is an open-access, peer-reviewed journal of digital humanities.
http://www.digitalhumanities.org/dhq/
10 stars 5 forks source link

Updates to DHQ schema to accommodate <biblStruct> from Zotero #52

Open juliaflanders opened 8 months ago

juliaflanders commented 8 months ago

Made changes to the ODD file to accommodate the use of <biblStruct> in DHQ:

I have put a test file temporarily at articles/999998/test_with_biblStructs.xml, which can be deleted once we're happy with the schema changes.

juliaflanders commented 8 months ago

Syd, thank you for the swift review! One thing that may not have been clear here is that all of our <biblStructs> will be autogenerated by Zotero, and we can't control their encoding. So the test file I included with this branch is sort of a reference format that the schema needs to match. I'm not certain but I think one or two of your suggestions above (on ll. 1707 and 1998) may have presumed that we could choose a more elegant encoding solution.

With that proviso--i.e. as long as the resulting schema will validate the biblStructs in the test file--please do feel free to go ahead and make any modifications to the ODD that you think are best! And thanks again.

sydb commented 8 months ago

Minor mods made (and pushed to this branch).

As for the format of the <biblStruct> itself, with respect, I don’t think the assertion that our data format is from Zotero is quite correct. I think the Zotero-generated bibliographic citations are tucked (by TEIGarage) into the XML as JSON inside a processing instruction. Our program common/xslt/convert_tei2dhq.xsl then converts that JSON into a <biblStruct>.[1] The string "\<note" only occurs in that file once, and it is the spot that generates the URL as a <note>. Just moving the entire <xsl:if> a few lines down (after the </imprint> but before the </monogr>) and changing the <note> to a <ptr type="original" target="{$zotero-item-map?URL}"/> would do the trick.

Note [1] Quite cleverly, BTW, @amclark42 did a really nice job. It’s a bit of a pain because all of the information is included at every citation, but we want just a small snippet of info at the citation point, and the whole thing in the back matter.

juliaflanders commented 8 months ago

I should have put it more precisely (and also realize this is something to coordinate with @amclark42): in many cases, our <biblStructs> will come directly as an export from Zotero (authors will export from their Zotero library as TEI and send us the results to be pasted in). So we would either need to:

I think @amclark42 's thoughts on which makes most sense are likely more relevant than mine; I feel like that third option on principle seems inelegant but the differences might be very small. I prefer to avoid option 2 because it would add a step for the encoders.

amclark42 commented 8 months ago

@sydb Echoing @juliaflanders, the approach we're taking in the XSLT is to use Zotero's processing instructions to produce <biblStruct>s that look like the ones the author would give us if they exported TEI <biblStruct>s from Zotero. This way, regardless of whether the metadata came from an export or the Word plug-in, Biblio and the DHQ display XSLTs should be able to parse them.

To be honest, I'm right there with you — I'd prefer <idno>s or <ptr>s too, and I'd really love it if those <note>s were placed outside the <monogr>. When Julia and I first started planning out this workflow, I cared enough about this to look into Zotero's export process, to see if I could suggest changes to it. We totally can! Zotero's TEI translator is right there on GitHub. The code is in Javascript, so I don't feel it's worth my time to try to fix things and submit a PR. But submitting a GitHub issue is still an option! I'm not up to taking the lead on that but I'll be happy to cosign if you decide to.

amclark42 commented 8 months ago

(That said, I think I've now spotted a few places where my translation isn't matching up with Zotero. Sigh. Back to it.)

sydb commented 8 months ago

More than happy to take the lead on a Zotero issue! So I think this PR can be merged as it is now, with the realization that we may want further updates down the line. @juliaflanders — Let me what your thoughts are on <persName> (whether schema should restrict it except for within <biblStruct>, and if so, whether that restriction should be RELAX NG or Schematron). @amclark42 — Let me know when you have finished updated the translation; can you update the articles/999998/test_with_biblStructs.xml file, too?

sydb commented 8 months ago

See https://github.com/zotero/translators/issues/3171, if interested.