daisy / pipeline-modules

Modules for the DAISY Pipeline project
4 stars 5 forks source link

Should Pipeline automatically add missing accessibility meta data elements that Ace wants (`a11y:certifierCredential`, `a11y:certifierReport` and `dcterms:conformsTo`)? #79

Closed bertfrees closed 8 months ago

bertfrees commented 8 months ago

Issue reported by Tom McCartney:

Ace is also commenting that there are three accessibility meta data elements that are missing that it wants: a11y:certifierCredential, a11y:certifierReport, and dcterms:conformsTo. I can add the "a11y" values to my source documents (or possibly to a meta data file during translation if I can sort out how that works) once I know what to add for values to those elements. The third one though illustrates another problem that I've had with the html-to-epub3 conversion script; I have value for the conformsto element of "http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aa" which is correct according to what we determined a few years ago, but the conversion script is completely stripping ANY link or a element that contains an external reference - it only appears to respect files found in the source file set. While I understand the logic of that for something like missing relative references, it presents a problem for external things like mapped hyperlinks to an exhibitors website or a clickable email link with a mailto protocol. Again, this was something in the past that I had tweaked the XSLT to handle with the Pipeline 2 conversion script, but I can't do that now. The bigger issue that I see here is that even if you don't want to include anything clickable as a security feature, it's also stripping the content of the link, so if you have something in the document like "...navigate to <a href="http://csun.at/conference>http://csunt.at/conference to see the latest..." what you end up with is "...navigate to to see the latest..." because not just the link is removed, but the full content, which as you can imaging causes reading problems.

The second part of the issue (stripping of <link> and <a> elements) is also described in https://github.com/daisy/pipeline/issues/763.

bertfrees commented 8 months ago

@GrayWolfMT I wonder how you used to add the metadata values to your source documents and how you modified Pipeline to get the metadata to end up in the EPUB. Currently Pipeline only looks at meta elements in the HTML and ignores link elements.

Note that the html-to-epub3 script also has a "metadata" option. If I pass it the following file, the result is as expected (no modifications needed):

<metadata xmlns="http://www.idpf.org/2007/opf">
    <link rel="dcterms:conformsTo"
          href="http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aa"/>
</metadata>
GrayWolfMT commented 8 months ago

@bertfrees - I think there might be a bit of confusion and mingling of a couple of issues in this ticket. First, to answer your question, I add the meta data to the source HTML files with elements, exactly as you suggested, so they are being pulled into the ePub just like you suggested they would be.

The link version of dcterms:conformsTo was getting lost because, as you said the conversion only pulls meta elements. I wasn't current enough on the spec on that particular element to know that it had changed to a "meta" format with a value that wasn't a URL. Once I updated to that syntax, the dcterms:conformsTo came through correctly as well.

For the purposes of this ticket, the question that I was originally trying to ask about meta elements was the fact the DAISY Ace was complaining about a couple of the a11y elements (after I corrected conformsTo) as being missing. I was checking with several people to determine if there was a value that we should be including for those, and if so, I was going to specify it in the source HTML. Unfortunately I included that in a message that also did identify a couple of other bugs and I think that caused some confusion with this issue.

I don't believe in this case that the elements should be applied automatically by pipeline because it doesn't have anyway of knowing what the appropriate values would be for those elements. In a limited test, if I included those in the source with a value, they came through correctly, but in the end we did not include them because they do not apply to the process that we were using.

I think this can probably be closed as clarified and call it good.

GrayWolfMT commented 8 months ago

As a side note, I was trying to use the metadata option to inject some of the meta data elements, but I must have misunderstood either the instructions, or the structure of the file that was supplied because I was not able to get it to insert the meta data that I was looking for. I will re-test that with your example and see if I can figure out where I went wrong with that.

For me, that would be easier than needing to put it in every source file, I just didn't have enough time to sort that process out this year.

bertfrees commented 8 months ago

OK thanks for clarifying!