AndyClifton / accessibility

A CTAN-compliant version of the LaTeX `accessibility` package
37 stars 6 forks source link

Text objects in example article are not tagged #12

Open juh2600 opened 4 years ago

juh2600 commented 4 years ago

Describe the bug Text objects in the example article are not tagged, resulting in a failed accessibility check.

To Reproduce Steps to reproduce the behaviour:

  1. Obtain the free PDF Accessibility Checker from Zugang für alle. This issue was checked with PAC 3.
  2. Open the example article PDF in the checker.
  3. Below the listing of checkpoints, select "Results in Detail" to see a breakdown of the untagged objects.

Expected behavior Text objects in a PDF compiled with this package should probably be tagged; this feels to me within the scope of the project.

Log messages article_PAC_Report.pdf TextObjectNotTagged

Additional notes This is not isolated to the example article; I discovered this while working on my own document.

I appreciate the work you've put into this project! If there's anything I can do to help, I'd be glad to. I'm by no means an expert on LaTeX or PDF, but I've been studying the incantations lately for my accessibility projects, and I'm quite eager to see this work as well as it can. I'd rather spend a week in vi writing LaTeX than an hour in Acrobat.

AndyClifton commented 4 years ago

@josephreed2600 - thanks for the bug report and offering to help. I'm just getting back in to this project after some distractions and will start putting together a roadmap soon. I'll get back to you when I see how this fits and what might be required to ship something useful.

viktoriasee commented 4 years ago

I do not see any tags generated on a simple dummy file:

\documentclass{scrreprt}
\usepackage{accessibility}

\begin{document}
text.
\end{document}

Although it runs without errors or warnings in pdftex.

So it's not just an issue with the example file. It's not working at all.

AndyClifton commented 4 years ago

Did you try \usepackage[tagged]{accessible}?

viktoriasee commented 4 years ago

I hadn't because I read the manual p. 5/6:

Gibt man keine Optionen an, so wird ein PDF mit den Standardoptionen erzeugt. D. h. es wird Tagged PDF mit einer geschachtelten Struktur erzeugt.

Indeed, when I use \usepackage[tagged]{accessibility} I get a PDF with a tag. I think the documentation should win here. But even this minimal example does not produce accessible pdf: pac3-latex-accessibility-minimal

viktoriasee commented 4 years ago

Although when tagging is on and the tags are visible in Acrobat, we get this error in PAC "Tagged content and artifacts" for the very exact content that is tagged.

AndyClifton commented 4 years ago

Hm. So with a comparable document from another source (e.g. MS Word), does the error still occur / get flagged by PAC?

I'm interested in whether this is a problem from latex or something else.

viktoriasee commented 4 years ago

In short: no A MS word created pdf with the content "Text." and file property title<> empty validates in PAC except from the PDF/UA metadata. No other errors. Text.pdf Text.docx

AndyClifton commented 4 years ago

Ok, thanks. Could you upload the word document (attach to the comment) for comparison, please? Thanks!

viktoriasee commented 4 years ago

I think I have one more hint on this. When I open my minimal example in Acrobat, open the tags tab, click on a content container the correct paragraph text is highlighted with a blue frame: missing content However, in a normal document you would see the content. See the same pdf again after I disable accessibility and add the tags automatically in Acrobat: content there Does that ring a bell?

AndyClifton commented 4 years ago

So... looking at the MWE from@viktoriasee, I see two things:

  1. In the MWE generated using latex there is a highest-level "Document" branch in the PDF that shouldn't be there.
  2. In the MWE generated using latex there is no content in the <p> container.

This gives us some places to look.

viktoriasee commented 4 years ago

I agree with 2. But the «Document» master tag is fine. It's one of the few things where accessibility does a better job than Acrobat. PDF/UA checker PAC3 complains if there is no such master tag.

radinamatic commented 4 years ago

Yes, please keep the top-level Document tag for PDF/UA checking.