Digital-Humanities-Quarterly / dhq-journal

DHQ is an open-access, peer-reviewed journal of digital humanities.
http://www.digitalhumanities.org/dhq/
10 stars 5 forks source link

Adding <word> #59

Closed juliaflanders closed 5 months ago

juliaflanders commented 6 months ago

Added element (syntactic sugar for <mentioned>) and added styling to render it in italics.

Proposing here because it is shorter and a bit vaguer (and hence will be more likely to cover the full range of what we need to encode with it).

sydb commented 6 months ago

Technically, looks great. (See below.) My only concern is that if the semantics of <word> are a bit vaguer than the semantics of <mentioned>, and since (if I understand correctly) it might be used to mark a phrase, a correct semantic definition should be included in the ODD. If short, this could be done in the <desc>:

<elementSpec ident="mentioned" mode="change" module="core">
  <altIdent>word</altIdent>
  <desc>encodes a word or phrase that the author is talking <emph>about</emph></desc>
</elementSpec>  

A long annotation could be included in <remarks>:

<elementSpec ident="mentioned" mode="change" module="core">
  <altIdent>word</altIdent>
  <desc>encodes a word or phrase that the author is talking <emph>about</emph></desc>
  <remarks>
    <p>Very like the TEI <gi>mentioned</gi> element, but we also use
    it for BLAH BLAH BLAH.</p>
    <p>Typically on output would be styled in italic typeface.</p>
  </remarks>
</elementSpec>  

Testing

Of course we do not have a test suite, and the above does not test that the change actually validates a <word> element in running prose. (I am 99.44% sure it would just by reading the .rnc, but …) So I took articles/000550/000550.xml (which has lots of occurrences of the word “word”) and changed every occurrence of \bword\b to <word>word</word>, and the result was valid.

As for the XSLT change, alone it is so innocuous that hardly seems worth testing. I did, anyway … when transformed by this branch’s common/xslt/template_article.xsl, the altered 000550.html file has lots of <em class="word">word</em> that are just “word” in the main branch.

But the change would result in a problem if any stylesheet that imports or includes dhq2html.xsl[2], or any stylesheet it imports or includes[3] already had a template that matched tei:word. None do.[4]

Notes [1] The RELAX NG has this snippet:

dhq_mentioned =
  ## marks words or phrases mentioned, not used. [3.3.3. Quotation]
  element tei:word {
    dhq_macro.phraseSeq, dhq_att.global.attributes, empty
  }

[2] Turns out there are only 5: template_editorial_article.xsl, dhq-preview-html.xsl, template_article.xsl, search_results.xsl, and article_list.xsl. [3] There is only 1: coins.xsl, which needs to be dragged into the modern era. [4] Note-to-self: Used xsel -t -m "//@*[contains(.,'word') and not( contains(.,'keywords') )]" -o "---------" -f -n -c ".." -n -n $(find . -name '*.xsl' -o -name '*.xslt') | nons to test. Got a few false positives, but it was still obvious there are no conflicting templates. (Yeah, I just looked at all XSLTs, not the 5 that actually import.)

juliaflanders commented 5 months ago

Thank you, Syd--I've made those changes (added <desc> and <remarks>) and hopefully this is now ready to go. I'm going to merge in the pull request and if there are any other adjustments needed we can do those as needed afterwards.