Closed mltony closed 1 year ago
If accepted, this would fix #22 that I reported a few years ago.
Hi @mltony , thanks for the contribution. I will review the code today in the night. Seems that it should work correctly in theory. I just need to check the code and do some tests that failed when I tried to improve the index sync. For now, I have a question: What would happen if two indexes are set at the same point? I don't know if this is a possible case, perhaps you can comment about this.
Phonetic punctuation doesn't set two indexes at the same time, so I haven't tested this. But I would think it should work. When eciCallback is triggered for index event and the buffer is empty, I write a single 16-bit zero to the buffer and play it via player before triggering index callback via onDone argument of player.feed(). This makes sure that index is triggered correctly at the beginning of the string, which should be similar to two indexes one after another.
Does this still work on the latest NVDA alphas that switch to using WASAPI? The behavior of player.feed may be different. Not to mention you can pass a straight ctypes buffer now instead of having to convert to a bytes object which reduces performance overhead. This, of course, is not backward compatible.
Hi @mltony, I did the tests that I mentioned in #22, and the issues with the audio are still present in some cases using this PR. Try the cases at 0, 15, 20 and 30% rate speeds. In spanish language, the issues are even more noticeable.
I don't know if those issues are present with the new alpha of NVDA using wasapi, I need to update some things but I haven't had time yet to update the code.
I don't know why this happen with this specific synth, seems that nvwave does not like small chunks of audio.
I also noticed that this PR introduces a very noticeable lag in many cases, such as, for example, character spelling. It's very noticeable and the add-on feels slower with this new indexing when compared to the old form. The crackling issue is fixed in latest alphas using WASAPI, though.
@davidacm, I can hear barely perceptile defects when playing your examples - without you pointing them out I would not have noticed them. I wouldn't call them crackling, just barely perceptible artefacts. Could you describe examples in Spanish - if they are more pronounced this might help debugging. It is possible to rewrite logic without adding a single zero of silence, although my intuition tells me that wouldn't solve the problem of artefacts. I'll give it a try.
Also with upcoming NVDA migration to wasapi, wondering if it makes sense to work on fixing winmm version? It seems to me that winmm is a lost cause at this point. Basically if I can figure out a way to make sure that wasapi version works without artefacts or crackling, would that be fine with you?
That would be fine with me and I think should be what we aim for. Though the install tasks will have to be aware as 2023.2 is theoretically not an API breaking release and so will load things last tested with 2023.1, unless they make an exception. I don't think they will though.
Superceded by #96.
Hi, this is Tony, I am the author of Phonetic Punctuation NVDA add-on. I propose to make some changes regarding index processing in IBM TTS in order to make it compatible with Phonetic Punctuation. In current version index events are fired too early in most cases, making Phonetic punctuation to play earcons for punctuation marks earlier then they occur in text, and thus making Phonetic Punctuation unusable with IBM TTS. The reason for that is even though calls to player.feed() and onIndexReached() are queued in the right order to be executed by CallbackThread, a call to player.feed() is non-blocking, so it returns once audio chunk starts playing. As a result, index event is fired when the last audio chunk before index starts playing, whereas it should fire when last audio chunk finishes playing. I work around this by utilizing onDone callback in player.feed() function. Major changes in this PR: