edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.83k stars 154 forks source link

Alt text not clearly distinguishable from body text #1983

Open Minnemann-dzblesen opened 1 year ago

Minnemann-dzblesen commented 1 year ago

During our manual accessibility testing of an EPUB, we encountered the following problem: the alt text to an image is well announced with Thorium's integrated read aloud feature. However, it does NOT announce that it is an image. This might be pretty confusing for readers with impairments. The blind reader should recognize that this is an image text.

In combination NVDA or Jaws with Thorium (which is certainly not meant to be), "image" (or also "image end") is announced after or before the alt text in each case. Also the screen reader integrated in Windows or Voice Over with Apple Books announce that it is an image. We have tested with Thorium 2.2.0.

Or are we doing something wrong? Thanks.

danielweck commented 1 year ago

Hello, Thorium doesn't inject additional prompts, the TTS readaloud experience is based purely on authored alternate text / accessible descriptions. If Thorium injected speech prompts at runtime, would they need to match the language of the user, or of the publication metadata, or of the text content itself? I assume the user locale, so there could be a discrepancy with authored language

gautierchomel commented 1 year ago

Team discussions so far on the subject:

  1. Some users will want additional semantics, but others will prefer a lighter reading experience, meaning that implementation should provide a verbose parameter (as in assistive technologies).
  2. If we start with Images we'll have to discuss about Notes, Asides, Emphasizes, etc.
  3. Different levels of semantics will be needed, HTML and ARIA at least (we must be able to differentiate images with roles).
  4. Translations will be needed

The unresponded question behind is "shall we consider Thorium TTS feature as an AT".

On the positive side it would allow for a strong higlight of semantics (meaning that it makes sense to add them for producers) and will certainly serve a lot of users.

Still, the effort to design, add and maintain is to be considered and dedicated funding shall be found.

gautierchomel commented 1 year ago

Also to consider page numbers :) see related issue #1974 and discussions #1951 and #1799.

tedvandertogt commented 1 year ago

I remember from narrating books as a volunteer, that we had a whole set of rules including how to handle special elements (like authors, page number, images, footnotes), If a user relies on audio alone, he or she will easily get confused without those prompts. Depending on the type of user, the desirable UX may differ. I suggest to draft user requirements for several user types, and decide how to address these without introducing too much complexity

Minnemann-dzblesen commented 1 year ago

Many thanks for the numerous replies, the good discussion and thoughts on this and even if there is no prompt solution to this, the drive that comes in here. We as a German institution and also many other European institutions often and primarily recommend Thorium to publishers (to check their EPUBs in fact of the approaching EAA), so a comprehensible and high quality is important to us, but I am sure about that in the process.