Open sydb opened 6 months ago
I just discovered that DHQ does not in fact use the @lang
or @xml:lang
attributes to indicate when language shifts in HTML article content. I have to stress that using a class is not enough. For accessibility, we have to use one of the designated attributes.
(Sorry to butt in on your issue, Syd. Both of our problems are tangled up in the HTML-producing side of things, so.)
I think that — in the static site, at least — we should use both @lang
and @xml:lang
, and maybe leave @class
, as well. My logic is that the only downside is filesize, but given that we already stuff all the CSS, JS, and images in there anyway, adding a few attrs will not make any significant difference.
The dhq2html.xsl program (which I was poking at for some other reason), in the various templates for
<note>
, sets the language by looking atancestor::tei:text/@xml:lang
. Since@xml:lang
can appear on any one of<dhq:abstract>
,<div>
,<dhq:example>
,<floatingText>
,<foreign>
,<q>
,<quote>
,<dhq:teaser>
,<term>
,<text>
,<title>
, or<word>
, all of which can have<note>
as a descendant, this seems prone to error. That is, if an input document containedthe output
<note>
will be flagged as being in Spanish, not English. There is a counter argument that asks what about the following:for which the
<note>
would “correctly” be flagged as being in Spanish. My response to this counter argument is that this passage is, per the rules of XML (over which we have no control), incorrectly encoded. If we are going to use@xml:lang
, we have to follow the spec.To be fair, this is not a current problem[1] and may well never happen. But seems to me we should guard against it anyway.
Notes
[1] There is only one article which contains a case of
//text[@xml:lang]//*[@xml:lang]//*[@xml:lang]
, and in that case the ultimate@xml:lang
is superfluous. See articles/000251/000251.xml circa line 299. (I would correct this myself but I am not sure if it should be encoded with<quote xml:lang="grc">
or<quote><foreign xml:lang="grc">
— there are lots of cases of each method.) [2] I think this issue should be fixed before #14 is handled. (Although only<said>
and<p>
are left, the rest already have@xml:lang
.) [3] I suspect this situation is a hold-over from a previous era when@xml:lang
pretty much only appeared on<text>
and<foreign>
, in which case the current method makes sense. If instead it is the case that there is an editorial rule that the natural language of an annotation has to match the natural language of its ancestor<text>
, then I think that should be schema-enforced. (I.e., require that if there is an ancestor element other than<text>
that has an@xml:lang
, a<note>
must have an@xml:lang
that matches the one on its ancestor<text>
.) In which case this ticket could be closed-won’t-fix or could be fixed, the results would be the same. [4] BTW, this is also error-prone because there are some 50 texts that have<text>
s inside<text>
, in which caseancestor::tei:text/@xml:lang
returns 2 items, although one of them is the empty string. That could cause problems someday if some code expects to use whatever it returns for something other than a string.