Accessibility of in-browser PDF Viewer

brennanyoung commented 2 years ago

I welcome the appearance of data on in-browser PDF Viewers, resulting from #2212

Different browsers handle pdf in different ways, and the various PDF viewers have different capabilities, and this can have an impact on accessibility conformance.

AFAIK, only pdf.js used by Mozilla makes an effort to communicate PDF content to the accessibility tree (i.e. the subset of the DOM which is communicated to assistive tech). It makes a translation from PDF/UA tags to ARIA roles and attributes.

If people are deciding to use or not use PDF-in-browser on the basis of caniuse data, I believe they should be informed of different levels of accessibility support.

LifeIsStrange commented 2 years ago

Note that a website can embed PDF.js and it works on chromium browsers since it is an HTML5 renderer. However most users probably prefer the default experience of their browser.

brennanyoung commented 2 years ago

@LifeIsStrange That's exactly the kind of information that would be useful to see on caniuse.

BTW, pdf.js relies on aria-owns to construct an accessibility tree. Quite a cunning solution, except that aria-owns is not supported at all in Safari.

Given that Acrobat and Preview also fail to generate such a tree, this means that at time of writing there are no PDF viewers that run on any of Apple's platforms (inside or out of browser) which communicate the tree to the system level accessibility API.

This has an impact on the defacto portability of this nominally portable file format.

Malvoz commented 2 years ago

ref: https://github.com/accessibilitysupported/a11ysupport.io/issues/222

brennanyoung commented 1 year ago

Just a FYI: if you open a semantically-well-formed HTML5 document in chrome, and print to PDF (using the default mechanism for this) you will get a nearly semantically-well-formed PDF/UA. Headings and lists are getting tagged correctly, at least.

However, there are still some issues - I reported several on the chromium bug database yesterday. Lots of bogus <NonStruct> tags are getting generated, which are relatively harmless (similar to role="generic").

Unfortunately several meaningful semantics such as article and section are also getting mapped to <NonStruct>, even tho PDF/UA has <Art> and <Sect> available.

Creating PDF is not really the bread-and-butter of caniuse, but this is a REALLY good development. It means that Chrome is a viable authoring tool for accessible PDF, which plays well with (e.g.) Acrobat and NVDA.

However, the default PDF view in Chrome does not seem to generate a proper accessibility tree at time of writing.

LifeIsStrange commented 1 year ago

@LifeIsStrange That's exactly the kind of information that would be useful to see on caniuse.

BTW, pdf.js relies on aria-owns to construct an accessibility tree. Quite a cunning solution, except that aria-owns is not supported at all in Safari.

Given that Acrobat and Preview also fail to generate such a tree, this means that at time of writing there are no PDF viewers that run on any of Apple's platforms (inside or out of browser) which communicate the tree to the system level accessibility API.

This has an impact on the defacto portability of this nominally portable file format.

@jensimmons friendly ping

brennanyoung commented 1 year ago

Update - I'm having some success with semantic browsing in Preview and VoiceOver! Not sure what has changed or when. (The PDF document used matters a great deal, of course). I haven't seen any announcements from Apple about this feature. Very obvious that things behave differently to the web, but at least there is a minimal implementation. I hope it will be fleshed out.

brennanyoung commented 1 year ago

Sketching out a test profile for consumption (not authoring).

This will not be exhaustive, but it will get us moving. I'm using the nomenclature as it appears in Acrobat, or in the Tagged PDF Best Practice Guide I've broken these into categories but the breakdown is open to adjustment. I imagine one test PDF per category, or something like that. (Please advise on the wisdom of this, or offer any suggestions for further enrichment/value).

I imagine each of these as pass/fail. A "pass" is if the AT announces the element and (if non-generic) the role. For tree exclusions, a "pass" is if the AT does not announce the content.

I expect that we will need to document/express partial pass (with remarks) in some cases, but we'll cross that bridge later.

Essential metadata (level A)

Title (document metadata) Required for WCAG SC 2.4.2: Page Titled
Lang (document metadata) Required for WCAG SC 3.1.1 Language of Page

note: as in HTML, the Lang attribute may be applied to almost any other tag, including those with generic semantics such as Div and Span, so we should test for SC 3.1.2 "Language of Parts" too), especially with a mixed lang document. A "pass" here would be (e.g) for a screen reader's speech synth to use the correct phonemes (if available). Not sure if there are similar criteria that could be used for (e.g.) Braille devices. Advice welcome.

Basic Block Level Semantics

Required for WCAG SC 4.1.2: Name, Role, Value

Paragraph (<P>)
Heading (<H> and <H1>-<H6>)
Heading Level (<H1>-<H6>)
Lists, List Items, label and body (<L>, <LI>, <Lbl>, <LBody>)
Citation (<BlockQuote>)

Inline semantics

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

Span (<Span>)
Inline quote (<Quot>)

Links, References and Annotations

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

Hyperlinks (<Link>, OBJR)
Cross reference (<Reference>)
Footnote/Endnote (<Reference>, <Lbl>)
Annotation (<Annot>, OBJR)
Table of contents (<TOC>, <TOCI>)
Index (<Index> containing <L>) (we might consider an additional test for an AT making a successful "round trip" to and from the linked/referenced item)

Structural Semantics

Required for WCAG SC 1.3.1: Info and Relationships and SC 4.1.2: Name, Role, Value.

Generic wrapper (<Div>)
Data table basics (<Table>, <TR>, <TD>)
Data table extras (<THead>, <TH>, <TBody>, <TFoot>, TH scope attribute, TD ColSpan attribute)
Nested lists
Document structure (<Document>, <Part>, <Art>, <Sect>)

Text Alternatives

Required for WCAG WCAG SC 1.1.1: Non-text Content.

Captioned figure (<Figure>, <Caption>)
Captioned table (<Table>, <Caption>)
Alt text (Alt and ActualText attributes)
Expansion of abbreviations/acronyms ('E' attribute)

Exclusions from the Accessibility Tree

Open (<NonStruct>) i.e. excluded but does not hide contents from tree (≅ role="generic")
Closed (<Private>) i.e. excluded and hides its contents from tree (≅ aria-hidden="true")
Suppressed for readability (Artifact) i.e. AT may announce but not by default (somewhat similar to aria-details?)

To be considered/explained/understood before testing

Code (<Code>) Tagged PDF Best Practice Guide has low expectations about how ATs may handle this. Do we agree?
Bibliography (<BibEntry>) - presumably there are features for book metadata such as date, publisher, ISBN etc.?
Asian writing tags (<Ruby>, <RB>, <RT>, <RP>, <Warichu>, <WT>, <WP>)
Form elements (<Form> ... but what about operable elements? Needs a deep dive.)
Sidebar (has no explicit semantic in PDF/UA, but can be implied with certain structures. Should we test for this?)
We might want to test for other metadata such as Subject, Author and Keywords, but I don't know how ATs are supposed to handle those.

Fyrd / caniuse