Digital-Humanities-Quarterly / dhq-journal

DHQ is an open-access, peer-reviewed journal of digital humanities.
http://www.digitalhumanities.org/dhq/
10 stars 5 forks source link

time to enforce `@xml:id` requirements? #71

Open sydb opened 3 months ago

sydb commented 3 months ago

There are at least 2 or 3 places in our stylesheets where the ID of an element is tested thus:

<xsl:choose>
  <xsl:when test="@xml:id">
    <xsl:value-of select="@xml:id"/>
  </xsl:when>
  <xsl:otherwise>
    <xsl:value-of select="generate-id()"/> <!-- or whatever -->
  </xsl:otherwise>
</xsl:choose>

This would be, IMHO, a whole lot shorter and easier to read if written idiomatically as

<xsl:value-of select="( @xml:id, generate-id() )[1]"/>

The only difference (besides readability) is that the former ignores @xml:id attributes that have no value (or just whitespace as their value), whereas the latter dutifully returns an empty (or just whitespace) string. We would never want an empty (or just whitespace) string in these cases, but luckily both the DHQ schema (for most “.xml” files) and the TEI schema (for common/xml/projects.xml and common/xml/taxonomy.xml) forbid empty (or just whitespace) values of @xml:id.

BUT, nonetheless, there are 17 cases of an @xml:id that is empty (or just whitespace) in the collection of all our “.xml” files:

     12 common/xml/projects.xml
      3 articles/000695/000695_converted.xml
      2 articles/templates/dhq_translation_template.xml

I do not think any of these files would be processed into HTML, anyway, so the lack of @xml:id values (other than null or whitespace) is not actually presenting a current problem. But a) I am not sure of that, and b) they are still invalid. In the case of a template file, it is often the case that invalidities need to be tolerated, and there are comments in there that instruct users to fill in the @xml:id.[1] But in the other cases, the files should probably just be valid, no?

Note [1] But there is another @xml:id problem in this file — a duplicate value. Two elements (near the bottom) both have xml:id="test1001".

sydb commented 2 months ago

Per this morning’s meeting (@jawalsh, @juliaflanders, @amclark42, and myself) change above is acceptable — all files listed that have empty (or whitespace only) @xml:id values are either holdover cruft (000695_converted.xml) or templates (the other two). I note, also, that empty (or whitespace only) @xml:id values are invalid, so should never occur in our actual data. Thus removing labels and will be making the above change to the XSLT at my leisure.