Closed sstraume97 closed 3 months ago
Thanks for your bug report.
ZoTTS doesn't do any text encoding/editing, and the debug output shows it's speaking the text exactly as given with diacritics.
My guess is that it's a PDF encoding error, can you tell me which paper you were looking at when this first happened?
Also, during testing it seems like English TTS voices don't discriminate between A and Å, if you could let me know what voice you use that would be helpful.
I don't remember which PDF it was, but the problem is there regardless of which PDF I get it to read in Norwegian. The PDF I attached I exported from Word. When I use the extension "Read Aloud: A Text to Speech Voice Reader" from LDS med PDF.js viewer in MS Edge (then with the voice Microsoft Jon as in Zotero) pronounced Å correctly.
TTS voice: Microsoft Jon - Norwegian (Bokmål)
I did some more investigating, and it looks like it is an encoding thing but only for (some of) Zotero, not the PDF itself.
There are two ways to display a "Å" character it seems, either as one unicode item (which is pronounced correctly) or as a regular A with a diacritic separately (which is pronounced incorrectly, likely due to the TTS engine cleaning and/or ignoring it)
Interestingly, there are also multiple ways to select the text in a paper, and only some of them change the unicode encoding.
I'll open issues on the plugin toolkit ZoTTS uses, and on Zotero itself to let them know this is a thing.
In the meantime, I'll change how ZoTTS collects the selected text from the paper, so it uses a method that gives the unaltered version. Additionally, a future version of ZoTTS will let the user specify regex substitutions for the spoken text, and I'll make a note to add a built-in "unicode fixing" set of substitutions, just in case this bug should appear in a different way or be regressed on.
:rocket: This ticket has been resolved in v1.2.1. See Release v1.2.1 for release notes.
Now it works perfectly!
Checklist
Zotero version
7.0.1 (64-bit)
ZoTTS version
1.2.0
OS
Windows
OS (specific)
Windows 11
Steps to reproduce
Read selected text in PDF (Å.pdf) and note (text "A, a, Å, å") aloud.
Expected behaviour
Pronounce the letter "Å" correctly in PDFs (this is how Å is pronounced LL-Q9043 (nor)-EdoAug-å.wav, Wikimedia Commons).
Actual behaviour
In PDFs, "Å" is pronounced as "A". In notes, the pronunciation is correct.
Below is a screen recording from the reading: