edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.78k stars 153 forks source link

Em dashes should have a pause in TTS #1607

Open 413Michele opened 2 years ago

413Michele commented 2 years ago

Em dashes are used in books as an alternative punctuation to commas, parentheses or even colons, as explained in this page. Thorium's TTS ignores the em dash though, and there is no pause in the speech.

Given that the em dash has a precise semantic meaning, I think the TTS should have a short pause when it encounters one in the text, maybe with the same duration of the comma one.

413Michele commented 2 years ago

Is this something doable or it depends on the specific system TTS?

danielweck commented 2 years ago

Hello, thank you for bringing this to our attention.

Thorium's TTS "read aloud" feature relies on the underlying web browser technogogy known as Web Speech API, which itself delegates some of its functionality to platform-specific libraries (i.e. Windows, Mac, Linux).

Thorium is built on top of the Chromium web browser engine shipped by Electron (Chromium is also used in the Google Chrome web browser, Microsoft Edge, and quite a few others).

Chromium's implementation of Web Speech API supports SSML features which Thorium could (in principle) use to implement pause / prosody, pronunciation, etc. that are not handled by default (i.e. when just feeding plain text to the underlying TTS engine).

Thorium already implements some text fragmentation logic in order to extract reasonably-sized utterances from the document markup, for example HTML span elements inside p paragraphs generally don't produce an utterance of their own, unless they are semantically or structurally significant.

In addition to this markup-level processing, Thorium also splits long sections of unmarked text (usually, paragraphs) into shorter sentences, so that the previous/next playback navigation granularity is finer.

We could plug additional functionality in this existing processing pipeline. I think SSML is key to delivering a more sophisticated playback experience.

We are a tiny technical team with several other high development priorities so I cannot promise anything. We will report back here when progress is made.

Regards, Daniel

413Michele commented 2 years ago

Of course, personally I think it's incredible that Thorium and its backend were created from scratch and are already one of the best readers on desktop PCs and a new open source content protection ecosystem (which is sadly necessary).

I'm happy to know this is technically possible, I think there are no e-book readers at the moment that deal with SSML, so this is already promising!

formlessdao commented 2 years ago

I personally would have liked to see focus on the text to speech as the actual priority of your project since supposedly accessibility is such a big focus. It was absolute dumb luck I found your project. I have not really seen it mention in most lists of ebook reader software. As it currently is the Voices in Windows 10 that I have sound good while reading the word but there is a clear unpleasant robotic rhythm while it transitions from word to word. The playback rate increments is too significant and the increments themselves are too few and nothing under 0.5X. I want to be able to change the pitch. If there is a ? or ! when a new sentence starts and has them it should be able to change intonation. for that sentence. I also can not find how to repeat a sentence or go to the next sentence or how to go back to the previous sentence, no shortcuts for that. I also can't find any manual integrated in the software or your website. You could have bothered to make a A to Z video tutorial of all the features and functionality on youtube since you do not have a manual.