Render `<dhq:caption>`s in HTML

amclark42 commented 4 months ago

In this pull request, I've condensed and abstracted-out code used to produce <div type="caption">s for tables and examples. I've also included code to process <dhq:caption> when it appears as part of a table or example.

Background

<dhq:caption> is a valid element in the DHQ schema, defined as "A caption for an example or table." However, there are currently no instances of the element in any DHQ article. @brgrey was under the impression that it was deprecated.

Benjamin also discovered that

<dhq:caption> seems to have been converted en-masse to <head> on 20 June, 2013.

I dug into dhq2html.xsl and found

example/head is treated as a caption, and since <caption> is suppressed except when it follows figDesc/head, captions straight-up aren’t shown. They might not be shown for tables either.

The continued presence of <dhq:caption> in the schema indicates that DHQ may have intended to continue using it for tables and examples (but not figures, which have <figDesc>). However, because the HTML-producing stylesheet doesn't provide for those cases, DHQ editors must place their “captions” inside <head> instead. And they have done so.

Why bring back captions

While I am missing the historical context behind <dhq:caption>, I believe the element should be re-integrated into the DHQ encoding. Captions are important because they provide explanations, attribution, or other context. Because they are textual, they are accessible to both visual readers and those using assistive technologies like screen readers.

A caption is not a heading, and vice versa. They serve distinct roles, especially on webpages. To me:

A heading is a short-ish title or label for the figure, example, or table. It’s a marker for you to come back to or reference. Ideally, it’s unique within the article.
A caption is a description of the example or table, providing additional context or pointing out particular features that might not otherwise be clear.

There are, in short, useful applications for both. But they should not be conflated.

A specific use case

I was asked to review the examples in article 000711 for accessibility. As part of that review, I will write captions for a number of sentences with color-coded words. As I explained:

... blind users will have the categories and the sample record, but not the connection between the two — for instance, that “Lungelsucht” is a “cause-of-death”. Since we don’t have a programmatic way of associating the two, the example caption needs to explicitly make those connections clear for people who can’t use the visual styles as a guide. I’ve written what I think the caption should say below:

Example 1. In the annotated record, “Johann Rendt” is coded as a “person”; “Zuckerbacher” is coded as an “occupation”; “Berdronischen Hauß” and “Offenloch” are coded as “place”; “Lungelsucht” is coded as “cause of death”; and “35. Jahr” is coded as “age”.

These captions are only any good if they can be put into the article HTML. That's where this pull request comes in. Below is a screenshot of my example caption after it was run through the updated stylesheet:

amclark42 commented 4 months ago

In case anyone wants to test the XSLT, commit 24825476d3ff13bfae6593219e88fe15b3fdbe3a has the captions added to 000711-02.xml.

amclark42 commented 4 months ago

More on why this pull request (and separating headings from captions) will help accessibility:

Separating the two will have a short-term and long-term benefit.

In the short-term, having the heading (e.g. “Example 1”) be visually distinct from the caption will let visual readers tell what is a label and what is explanation. It may help neurodiverse folks process the example or table by letting them focus on the part they need.

In the long-term, having the heading distinct from the caption will let us programmatically describe the roles these two things are playing (e.g. this table is labeled by this bit of text, and this example is described by this bit of text). We cannot introduce these aids into the HTML until the two are disentangled in the XML encoding. Being more precise in the XML will give us more tools in the HTML.

Digital-Humanities-Quarterly / dhq-journal