Synchronized switching between braille display and TTS audio

ManfredMuchenberger commented 2 years ago

As a screenreader user, I want to be able to read an e-books on a braille display and after a while switch to audio and continue reading at the next paragraph of the e-book using the synthetic voice.

Detail: Currently, e-books only contain an ink print version of the text. It is easy to listen to the e-book using the synthetic voice of the Screenreader. But when switching to reading on the braille display that is also connected to the screenreader you don’t get good quality braille.

Proposal: Using a digital text file format that contains the ink print version of the text as well as a pre-transcribed braille version of the text in Unicode braille characters that is synchronized with the ink print version.

avneeshsingh commented 1 year ago

What should be the priority: High, Medium or low? My view is that it is not as high priority as navigation, reflowability etc., should we mark it medium or low priority?

ManfredMuchenberger commented 1 year ago

In my opinion this is quite important for users. Otherwise they have to buy/download two differnet filesets if they want to use audio and braille display. And we would have to provide both filesets. Also then there would be no synchronization. So I would at least assign High.

wfree-aph commented 1 year ago

This is a problem we need to prioritize soon. If we create a braille-first file like we said in the problem statement, then that will potentially lead to some TTS audio issues. The problem statement also includes this statement, "although we will discuss the possibility of including other information types and multimodal reading". So we'll need some consensus. Having braille and Latin script in the same file could be burdensome for braille producers and software but not having both could be burdensome for libraries and distributors. Alternatively, we could put the burden on screen readers and ask them to handle the problem of reading braille text. This is one we may want to discuss in the meeting.

mrhunsaker commented 1 year ago

The issue I see with TTS and a braille-first file format is that back-translation of braille (I speak with experience primarily in UEB ) is imperfect, and thus the TTS may read incorrect information. I could envision a system whereby the original print text was somewhat associated with the corresponding braille so the TTS could follow along. However, this seems more of a print-first than braille-first solution.

mhorspool commented 1 year ago

My view is that this is out of scope for this standard. Most screen readers can handle braille translation sufficiently for this not to be needed in the vast majority of cases. For those specific cases where it is needed, we should encourage existing synchronised audio/text standards (e.g. DAISY) to permit braille instead of text.

wfree-aph commented 1 year ago

Another option: provide a mechanism to switch from one file to another. One file is Unicode and the other is meant for audio (perhaps HTML) and the two are linked for navigation. Is it achievable?

Several folks in meeting agree to make this medium priority to revisit later.

ManfredMuchenberger commented 1 year ago

Another option: provide a mechanism to switch from one file to another. One file is Unicode and the other is meant for audio (perhaps HTML) and the two are linked for navigation. Is it achievable?

Yes, this is the the solution we envision so far. But I think the mechanism should be described in our new stadard, as it should provide means to the reading system to switch over and maintain the reading position (synchronized switch over).

From SBS view I would assign 'high priority' as this is a feature that we frequently get user requests for.

mattgarrish commented 1 year ago

it should provide means to the reading system to switch over and maintain the reading position (synchronized switch over)

Note that EPUB tried something like this in the multiple renditions specification with rendition mapping.

(It has not been implemented anywhere that I know of, though.)

mattgarrish commented 1 year ago

Re-reading this, is it necessary to flip from one format to the other here (which I already discussed in https://github.com/daisy/ebraille/issues/18#issuecomment-1271845100), or is it only necessary to be able to switch to audio playback?

If it's the latter, then EPUB and DTBook's use of SMIL to synchronize text with audio would seem to be useful here.

GeorgeKerscher commented 1 year ago

The use of Synchronized Multimedia Integration Language (SMIL) as used in the DAISY 3, Z39.86 Specification for the Digital Talking Book, would allow the audio files to come from human narration, or from TTS generated audio files. The newer cloud-based voices are getting better and better. TTS generated is supported in the DAISY Pipeline, but I don't think cloud voices are supported at this time. I would think that the audio is generated first, and then the braille encoding is done. This would support the synchronization of the audio with the braille at some level of granularity, such as the paragraph level.

bertfrees commented 1 year ago

is it necessary to flip from one format to the other here, or is it only necessary to be able to switch to audio playback?

It would seem to me that switching between contraction grades is part of the requirement too (if we also take into account #6 and #18).

If it's the latter, then EPUB and DTBook's use of SMIL to synchronize text with audio would seem to be useful here.

But in theory SMIL could be used to synchronize multiple text documents too, right? EPUB just restricts media overlays to audio?

I think it could make sense to have a HTML document with plain text, with one or more braille overlays (for different braille codes), or alternatively a HTML document containing text transcribed with one braille code, and one or more overlays for other braille codes.

Note that the text documents used for the overlays would only be used to get the braille and to replace the original text nodes with it, nothing more. The documents could be HTML, but any semantics or attached styles would have no effect, and any content not referenced from the SMIL would be ignored. This is the fundamental difference with the multiple rendition approach.

ManfredMuchenberger commented 1 year ago

Note that the text documents used for the overlays would only be used to get the braille and to replace the original text nodes with it, nothing more. The documents could be HTML, but any semantics or attached styles would have no effect, and any content not referenced from the SMIL would be ignored. This is the fundamental difference with the multiple rendition approach.

Why should we prevent an app to also show the HTML on screen including sematics and using specific styles? What would be the advantage of this limitation?

bertfrees commented 1 year ago

I'm just trying to assess the technical possibilities and limitations we have with the standards that are available.

The multiple renditions feature in EPUB allowed you to have several renditions of the same content. The renditions stand completely on their own (with the possibility to synchronize between them). This is one approach to address the use case of including pre-transcribed braille (#6) on top of the plain text and/or next to braille in other contraction grades (#18). However I can also see other approaches. In fact for this use case multiple renditions is a bit overkill because it duplicates the whole document including semantics and styles, while all that is needed is to have the braille transcription of each block of text. SMIL could help us achieve the synchronization that is required for this.

daisy / ebraille

Synchronized switching between braille display and TTS audio #7