edrlab / thorium-reader

A cross platform desktop reading app, based on the Readium Desktop toolkit
https://www.edrlab.org/software/thorium-reader/
BSD 3-Clause "New" or "Revised" License
1.8k stars 154 forks source link

EPUB3 Media Overlays: playback with fast speech rate => bad synchronization at word level #1068

Open ManfredMuchenberger opened 4 years ago

ManfredMuchenberger commented 4 years ago

Thorium v1.3.1-alpha.1.40 Books with media overlays and word level synchronization. If I set the playback to a faster speed, the highlighting of the text is falling behind.

danielweck commented 4 years ago

Hello Manfred, thank you for reporting this. I have not experienced this problem during my tests. Could you please specify which OS you are testing on, and would you be able to share a sample publication that exhibits this behaviour? (you know my email ;) Regards, Daniel

ManfredMuchenberger commented 4 years ago

its Windows 10 Pro Version 1809 I will send you a link to some of our books please use them for testing only (copyright)

ManfredMuchenberger commented 4 years ago

I have just shown Thorium to some collegues at our audiobook recording studio. We all like the progess that you have made regarding media overlays - congratulations!

But I have to report, that also with 1x speed the highlighting is a little behind the audio. We have also tried it on MacOS. There it is better but still not perfect.

danielweck commented 4 years ago

Thank you for the feedback Manfred, much appreciated.

Could you please clarify which revision of Thorium you used for testing? Recently some important bug fixes and improvements were introduced, you may install the latest automated build of Thorium to test the obvious TTS readaloud presentation/rendering changes (and some underlying less obvious EPUB3 Media Overlays updates): https://github.com/edrlab/thorium-reader/releases

I am not able to reproduce any audio lag, is there a particular TTS or Media Overlays EPUB you would like me to try? (I have a slower Windows computer which I can use to test this)

ManfredMuchenberger commented 4 years ago

We used 1.3.1 alpha 1.40 - but I have just re-tested it with 1.54. Now I get the impression, that it is a bit worse than it was with 1.40. Sometimes it is jumping whole words. We use a brand new Laptop that is quite fast I think (Intel core i7). You can test with any of the media overlay books that I have sent for testing. If you want, you can send me one of your testbooks and I can try them with this installation.

ManfredMuchenberger commented 4 years ago

this is a short Video to show the behaviour https://www.dropbox.com/s/awnd0gnz2zyyjtb/20200605_143740.mp4?dl=0

danielweck commented 4 years ago

Wow, this is quite a perceivable delay / glitch. Thank you for the video screen capture.

I have not experienced this with any of my test EPUBs, including several commercial fixed-layout word-level children’s talking books (like the one in the video).

So, would you mind sharing the EPUB with me please, so I can run some tests?

Thank you!

danielweck commented 4 years ago

Hello Manfred, thank you for sharing the test EPUBs. I am not able to reproduce the playback lag (well, to be precise: the latency when applying the highlighting CSS class for the Media Overlays "active" element). At most, I experience a few milliseconds of delay, but it is hard to perceive.

That being said, I can see how/why there would be a delay in some circumstances, due to the asynchronous nature of the communication between the audio playback engine (which runs perfectly smooth) and the "highlighting" mechanism which requires IPC (Inter Process Communication) messages with the webviews that hosts the EPUB's HTML content documents (sorry for the technical details, this is so that developers know what to look for when coming back to this issue).

ManfredMuchenberger commented 4 years ago

Helle Daniel Do you want to send me one of your testbooks (with media overlays and synchronization on word level) so that I can try it with our installation? Would be interesting to see if it has latency as well or if it happens only with our books.

danielweck commented 4 years ago

Hello Manfred, I strongly suspect that the performance issues were due to a bug in Thorium's database store. Could you please re-install the app from the automated releases: https://github.com/edrlab/thorium-reader/releases I am afraid you will loose your settings and library contents, so you will have to re-import publications (this only impacts the developer/test builds, not the production official release).

There should still be a few milliseconds delay, but nothing like the lag observed in your video.

ManfredMuchenberger commented 4 years ago

Hi Daniel You are right built 1.77 is much better! Still some delay that probably needs to be eliminated. The delay is worse if you put playback to faster speed.

CircularKen commented 4 years ago

Mac Thorium 1.3.1-alpha.1.2515 I too am having noticeable lag in media overlay highlights.

It is there but only slightly at 1x speed but becomes very far behind at faster speeds. Also the start and stop times are also out of step at faster speeds so audio from the next page is being played whilst the delayed media overlays are catching up.

Video linked and I can also share the file too if you wish.

https://www.dropbox.com/s/lk378r9f8xsvfco/Thorium%20Delayed%20highlights.mp4?dl=0

danielweck commented 4 years ago

Very useful feedback Ken, thank you.

I suspect the performance bottleneck is the Electron IPC asynchronous events which instruct the embedded sandboxed webview(s) to update the synchronized highlights. The audio playback itself occurs in the parent BrowserWindow runtime that hosts one of two webviews (two in the case of fixed-layout two-page spreads).

If I’m right, this is going to be a tough technical problem to work around, but first I need to be able to consistently reproduce the highlight lag. So please would you mind sharing the FXL title so I can run tests? (I have a slower Mac laptop which I can use to run stress tests)

CircularKen commented 4 years ago

No problem, here you go https://www.dropbox.com/t/xEBH8GBAgXXjCmXe

danielweck commented 4 years ago

Thank you Ken, 100% reproducible with r2-testapp-js on MacOS, with console verbosity turned down to minimum (as this can sometimes introduce lag), and particularly noticeable with increased playback rate.

I am pretty confident that this is due to “slow” Electron IPC messaging (with text-audio sync, a few milliseconds suffice to break seamless playback)

danielweck commented 3 years ago

Interesting: I investigated this further and my findings seem to indicate that no significant delay is caused by the passing of asynchronous messages across the IPC boundary (Inter Process Communication) between the Media Overlays "audio controller" (which resides in the top-level renderer process) and target XHTML documents (which are hosted by independent / isolated webview renderer processes, inside sandboxed iframes).

I also measured execution timings between the moment an instruction is given by the MO controller to play the next logical synchronised fragment defined in SMIL, and the moment an instruction is given by the webview API to highlight the synchronisation target (i.e. to apply the authored CSS class to the destination XHTML element). There is no apparent delay here either.

Yet, at accelerated playback speed the highlighted utterances clearly lag behind by a few significant milliseconds (this could indicate that the queued classList.add/remove() calls are struggling to keep up in real time?).

Note that I have been using Date.now() wall clock to measure time differences, therefore the measurement resolution is in the order of one millisecond. Both performance.now() and process.hrtime() offer better resolutions, but unfortunately they must be used within the same process, so not applicable in our case.

To speed up my tests, I hacked into node_modules/r2-navigator-js/dist/es6-es2015 directly from my local Thorium codebase. Notably: src/electron/renderer/media-overlays.js ( which corresponds to this TypeScript source file: https://github.com/readium/r2-navigator-js/blob/develop/src/electron/renderer/media-overlays.ts ) ...and src/electron/renderer/webview/preload.js ( https://github.com/readium/r2-navigator-js/blob/develop/src/electron/renderer/webview/preload.ts ). I simply created global time variables at the bottom of the JS file (e.g. let __t1 = 0;) and I recorded absolute times as well as time diffs (using const now = Date.now();) just before the call to mediaOverlaysNext() in the ontimeupdate() callback. See https://github.com/readium/r2-navigator-js/blob/e315e688984f4d3cd05918cb8e97a1137844c77e/src/electron/renderer/media-overlays.ts#L122-L131 Crucially, I then passed these time measurements to the IPC event payload (as additional fields) in the moHighlight() function, see: https://github.com/readium/r2-navigator-js/blob/e315e688984f4d3cd05918cb8e97a1137844c77e/src/electron/renderer/media-overlays.ts#L1096-L1101 Then, it was just a matter of collecting these statistics at the receiving end (e.g. const t1 = parseFloat(payload.t1);), immediately after targetEl.classList.add(activeClass); which applies the CSS highlighting styles, see: https://github.com/readium/r2-navigator-js/blob/e315e688984f4d3cd05918cb8e97a1137844c77e/src/electron/renderer/webview/preload.ts#L3159 In order to isolate the compound computational costs of IPC messaging and classList, I removed any potential other costly operations, such as console.log() calls (which are forwarded from webview inspector to shell stdout of Electron's main process), and more importantly the calls to scrollElementIntoView() and notifyReadingLocationRaw() which definitely add some overhead: https://github.com/readium/r2-navigator-js/blob/e315e688984f4d3cd05918cb8e97a1137844c77e/src/electron/renderer/webview/preload.ts#L3225-L3227 I also batched the dumping of timer statistics in the console by groups of 10, in order to rate-limit the stdout console logging, and to avoid introducing artificial delays which would otherwise not be present. Of course, I also closed the web inspector windows (one for the library view, another for the reader view, and one more for the embedded iframe!)

...to no avail. I am not able to pin-point a particular performance hog in our code logic, the perceived "sluggish" Media Overlays highlighting caused by short word-level playback synchronisation occurs regardless.

Penultimate technical note: Thorium makes use of the ES6-2015 transpiled output instead of the ES8-2017 Javascript code generated by the TypeScript build system in all r2-xxx-js packages. I believe the rationale for this choice is that at the time the project was created, the WebPack build process didn't support some modern ECMAScript constructs (I'm not sure which ones). This is a shame, because async / await Promise handling ends-up being polyfilled with yield generators, and the JS code is less efficient (less readable too). Take this example:

https://github.com/readium/r2-navigator-js/blob/e315e688984f4d3cd05918cb8e97a1137844c77e/src/electron/renderer/media-overlays.ts#L1118-L1120

=>

https://github.com/edrlab/r2-navigator-js-dist/blob/1956ae2cde77f271b045618010a581b465d538bf/dist/es6-es2015/src/electron/renderer/media-overlays.js#L840-L842

Final technical note: in the above code snippet, note the setTimeout() with a zero-millisecond delay, which is used to defer an asynchronous microtask (i.e. Promise async call) to the next "tick", and in some cases these are used to create a "top-level" async context when it is missing in the function scope / closure. This is arguably a suboptimal choice given the time resolution of setTimeout() compared with process.nextTick() (which bypasses the queued IO messages in the NodeJS runtime), and setImmediate() (which allocates the microtask in the next available event slot, without bypassing other high-priority queued IO messages). That being said, even after removing the setTimeout(), the perceived Media Overlays highlighting lag remained.

danielweck commented 3 years ago

Based on these findings, I think that the next logical step in this process of elimination is to migrate the Media Overlays "finite state machine" (i.e. granular SMIL orchestration based on the clock tick from audio playback) from the top-level controller context (i.e. principal Electron renderer process) directly into the XHTML content webviews (i.e. secondary isolated renderer processes).

This approach introduces a design inconsistency, in that the audio playback engine which generates the clock ticks at the user-chosen playback rate will then be located in the publication documents themselves. This is architecturally incorrect, but this is an acceptable trade-off if it yields the required rendering performance.

In practice, the continuous audio playback from one FXL page to another is implemented physically at the file level (e.g. single MP3 resource for an entire book), however at the logical level, utterances are delineated at authoring time in such a way that a pause in the narration is expected between page turns. So with the new approach where audio playback is not controlled from the top-level application context, the perceivable break when switching from one document to another (i.e. one FXL page to the next) due to the audio player instance being reset / reinstantiated, should not degrade the user experience much. Again, this would be an acceptable trade-off given the expected gain in highlighting fluidity.

To be continued...