davidacm / NVDA-IBMTTS-Driver

This project is aimed at developing and maintaining the NVDA IBMTTS driver. IBMTTS is a synthesizer similar to Eloquence. Please send your ideas and contributions here!
GNU General Public License v2.0
56 stars 23 forks source link

Fixing timing of index events #94

Closed mltony closed 1 year ago

mltony commented 1 year ago

Hi, this is Tony, I am the author of Phonetic Punctuation NVDA add-on. I propose to make some changes regarding index processing in IBM TTS in order to make it compatible with Phonetic Punctuation. In current version index events are fired too early in most cases, making Phonetic punctuation to play earcons for punctuation marks earlier then they occur in text, and thus making Phonetic Punctuation unusable with IBM TTS. The reason for that is even though calls to player.feed() and onIndexReached() are queued in the right order to be executed by CallbackThread, a call to player.feed() is non-blocking, so it returns once audio chunk starts playing. As a result, index event is fired when the last audio chunk before index starts playing, whereas it should fire when last audio chunk finishes playing. I work around this by utilizing onDone callback in player.feed() function. Major changes in this PR:

  1. playStream function modified to work well when buffer is empty. This is needed because indexing relies on onDone callback being passed to player.feed() function, and in certain cases (e.g. string starting with punctuation mark or two consecutive punctuation marks), we need to be able to trigger callback with empty buffer. player.feed seems to call onDone much later when the buffer is empty, so workaround is to write a single zero to the buffer, which is inperceptible to the ear, but makes callback work as expected.
  2. playStream function now reads global variable indexes and passes its value for triggering indices into onDone callback of bgPlay. This is done to avoid race conditions: we would like to trigger only those indexes that were available at the moment playStream function is called. We don't want to trigger any other indexes prematurely.
  3. eciCallback function is modified to always call playStream() for reporting indices. PlayStream in its turn would schedule callbacks for all index events in the right order.
  4. sendIndexes function is modified to call setLast (and therefore onIndexReached) from current thread instead of scheduling to be executed later asynchronously from CallbackThread. If we don't make this change, then any calls to setLast will be executed after speechSequence, since speech is fed to player from the same CallbackThread. I am not aware of any side effects of this solution, however if there are any, another option would be to create another thread specifically for processing index callbacks in an async fashion. For now I decided to go with a simpler version, however please let me know if there is a good reason to implement the idea with another thread.
mltony commented 1 year ago

If accepted, this would fix #22 that I reported a few years ago.

davidacm commented 1 year ago

Hi @mltony , thanks for the contribution. I will review the code today in the night. Seems that it should work correctly in theory. I just need to check the code and do some tests that failed when I tried to improve the index sync. For now, I have a question: What would happen if two indexes are set at the same point? I don't know if this is a possible case, perhaps you can comment about this.

mltony commented 1 year ago

Phonetic punctuation doesn't set two indexes at the same time, so I haven't tested this. But I would think it should work. When eciCallback is triggered for index event and the buffer is empty, I write a single 16-bit zero to the buffer and play it via player before triggering index callback via onDone argument of player.feed(). This makes sure that index is triggered correctly at the beginning of the string, which should be similar to two indexes one after another.

ultrasound1372 commented 1 year ago

Does this still work on the latest NVDA alphas that switch to using WASAPI? The behavior of player.feed may be different. Not to mention you can pass a straight ctypes buffer now instead of having to convert to a bytes object which reduces performance overhead. This, of course, is not backward compatible.

davidacm commented 1 year ago

Hi @mltony, I did the tests that I mentioned in #22, and the issues with the audio are still present in some cases using this PR. Try the cases at 0, 15, 20 and 30% rate speeds. In spanish language, the issues are even more noticeable.

I don't know if those issues are present with the new alpha of NVDA using wasapi, I need to update some things but I haven't had time yet to update the code.

I don't know why this happen with this specific synth, seems that nvwave does not like small chunks of audio.

Mohamed00 commented 1 year ago

I also noticed that this PR introduces a very noticeable lag in many cases, such as, for example, character spelling. It's very noticeable and the add-on feels slower with this new indexing when compared to the old form. The crackling issue is fixed in latest alphas using WASAPI, though.

mltony commented 1 year ago

@davidacm, I can hear barely perceptile defects when playing your examples - without you pointing them out I would not have noticed them. I wouldn't call them crackling, just barely perceptible artefacts. Could you describe examples in Spanish - if they are more pronounced this might help debugging. It is possible to rewrite logic without adding a single zero of silence, although my intuition tells me that wouldn't solve the problem of artefacts. I'll give it a try.

mltony commented 1 year ago

Also with upcoming NVDA migration to wasapi, wondering if it makes sense to work on fixing winmm version? It seems to me that winmm is a lost cause at this point. Basically if I can figure out a way to make sure that wasapi version works without artefacts or crackling, would that be fine with you?

ultrasound1372 commented 1 year ago

That would be fine with me and I think should be what we aim for. Though the install tasks will have to be aware as 2023.2 is theoretically not an API breaking release and so will load things last tested with 2023.1, unless they make an exception. I don't think they will though.

mltony commented 1 year ago

Superceded by #96.