daisy / pipeline-ui

A user interface for the DAISY Pipeline 2
MIT License
5 stars 2 forks source link

Settings dialog, TTS voices: screen readers skip text in unsupported scripts #231

Closed marisademeglio closed 4 weeks ago

marisademeglio commented 1 month ago

So in the voices list, for example, the voice names are read out except for those in unsupported scripts. What is "unsupported" will vary among machine configurations. In my experience, Armenian voices were not read. VoiceOver just says nothing where it would usually say the voice name in this case. For other non-latin-based scripts, it switched to the correct voice, e.g. for Russian, Arabic, Hebrew.

Often in this voices list, the language will have already been chosen, so if it is a language with an unsupported script, then nothing in the list box except for the first entry "None" will get read aloud.

What can be done about this - we don't have access to any other names for a voice, as that info comes from the TTS service directly.

We could add info after the voice name string. VoiceOver, for example, skips what it can't announce, but reads the rest of the string. So for "Անահիտ (Armenian)", I hear "Armenian". But that may also introduce a lot of verbosity.

We could instead prefix the voice name with a number, like as a list, just so something gets read for each entry.

marisademeglio commented 1 month ago

Test report from @ways2read confirms the same behavior with other screenreaders:

I tested with NVDA and OneCore voices, and Narrator with the Natural voice. The voice names are read for Javanese where the names are in Latin script but voice names are not read out for Kannada, Kazakh and Khmer which all use a non-Latin script.

marisademeglio commented 1 month ago

And the original issue description from @prashantverma2014

The Voices selection screen is good. It behaves like a webpage for screen reader which is acceptable. However, in the Voices list the names are in native font of that language, e.g. Hindi voice names are written in Hindi alphabets. This is not desirable. Either it should be in English or the name should be in both Hindi and English font. The reason for this is that the Screen Readers cannot be expected to be configured to read many language text. And in such cases Screen Reader is not reading anything in this list box.

marisademeglio commented 1 month ago

@NPavie found this library that might help to transliterate the name, called anyascii

https://daisy-dev.slack.com/archives/C064GB8U9/p1717603304699319

marisademeglio commented 1 month ago

I experimented with this library on a new branch (see commit above) and found that sometimes the romanization was good, i.e. it sounded like the voice name, but other times it was meaningless when rendered by a screen reader.

E.g. this arabic name “سناء” which to me sounds like “si-nah” (when I use the right voice to hear it) gets ascii’d as “sn’” and read aloud as “ess enn”.

But then some of them are ok e.g. the Greek voice Νέστορας (Nestoras).

NPavie commented 1 month ago

I found another library (MIT licence) : https://github.com/sindresorhus/transliterate

We also have the option of doing the transliteration ahead and do a quick hashmap with the result if we found a good way to do it outside of JS. I'll do a quick research round to see if I found something with better results that anyascii