Accessibility issues related to PDF

Febakke commented 1 year ago

Description of the bug

Used Adobe Acrobat Reader to test our new pdfs and a couple of points are recurring.

1. There is no defined primary language In the same way as you need to define lang in HTML, a PDF can also have its language defined. This is handy for screen readers. PDF WCAG technique

2. There is no title element If possible we should use the same title as the app does. PDF WCAG technique

3. Tab order The accessibility test gives an error here. Im not sure how we can fix this because there are noe focusable elements in the pdf. Maybe just flag this as not relevant? PDF WCAG Technique

Steps To Reproduce

Create a PDF Use Adobe Acrobat Reader and check accessibility See list of errors

Additional Information

No response

bjosttveit commented 1 year ago

After some investigation into these issues, it looks to me as if browserless/puppeteer/chromium can't fix these for us. It can therefore not be done anything about in app-frontend either. The first two issues (I think) could be fixed on the backend, by doing some additional processing to the PDF-file that is returned from the PDF service before it is saved to storage. What needs to happen there is adding some metadata to the document by for example using a PDF library in .NET. The last issue appears to have something to do with chromium not setting the tags correctly. I tried generating a PDF from a very simple html-document and it still had the same issue, suggesting that no changes to app-frontend would make things better. The third issue I think would be difficult for us to fix, as it probably entails manually changing the structure of the PDF-tags, and I am not sure that this is easily achieved using a PDF-library or even possible at all.

I will transfer this issue to app-lib-dotnet as that is where the changes need to happen.

bjosttveit commented 1 year ago

Btw, Pave appears to be a reasonably good free tool for checking PDF accessibility.

FinnurO commented 1 year ago

Informasjon fra Chromium: https://blog.chromium.org/2020/07/using-chrome-to-generate-more.html

FinnurO commented 1 year ago

@bjosttveit Could you follow up #1 and #2 with Chromium bug system? https://bugs.chromium.org/p/chromium/issues/detail?id=1362536&q=wcag&can=2

And consider reporting #3 as bug if not already reported?

bjosttveit commented 7 months ago

Language and title issues should be fixed in Chrome version 123.0.6268.0 and later https://bugs.chromium.org/p/chromium/issues/detail?id=1362536#c22

bjosttveit commented 7 months ago

I did some local testing of the version of the PDF-generator defined in the charts (with puppeteer-21.4.1). In my case that had a chromium version of 119.0.6045.9. I tested in Pave, and it seems like it sets the tags much better now than when I tested this one year ago. The only issues now in my PDF are missing alternative texts in some images and links, which is probably easily fixed in my app configuration. (except for language and title which needs an even newer version)

I have not tested using a screen reader, but I can imagine it should be much better since the "reading order" listed in Pave looks really good now where it did not before. These issues should therefore already be fixed in prod 🥳

As for language and title, I will test again when an up-to-date enough chromium version is shipped with browserless that I can test (there are some issues with apple-silicon on the latest images currently). We need to make sure the frontend will set the correct language and title when this becomes possible.

Additionally, it looks like chromium has added support for generating a document outline in version 121 I think. Which produces links in the sidebar (usually) of PDF readers that point to headings (see this). This may require setting some flags in app-lib to enable.

bjosttveit commented 5 months ago

Chromium version 123 is released and stable. It is available in puppeteer v22.6+. However, the browserless images still seem to be several versions behind on dockerhub, and its not clear if or when these will be updated. We are currently using v1 of browserless, and since then, they have released v2 and moved to GitHub Container Registry. v2 seems to be more up to date with puppeteer.

To get more up-to-date chromium/puppeteer versions, we may need to consider either migrating to browserless v2, or making our own PDF-service using puppeteer directly which would give us more control over the PDF-generation process anyway.

Altinn / app-lib-dotnet