davidacm / NVDA-IBMTTS-Driver

This project is aimed at developing and maintaining the NVDA IBMTTS driver. IBMTTS is a synthesizer similar to Eloquence. Please send your ideas and contributions here!
GNU General Public License v2.0
55 stars 23 forks source link

Callback commands are not handled correctly. #22

Open mltony opened 4 years ago

mltony commented 4 years ago

Hello, This is mltony, I am working on NVDA add-on called phonetic punctuation: https://github.com/mltony/nvda-phonetic-punctuation IBM TTS driver doesn't seem to handle callback commands in python 3 version correctly in some cases. Here are steps to reproduce:

  1. Install phonetic punctuation add-on: https://github.com/mltony/nvda-phonetic-punctuation/releases/download/v0.2dev/phoneticPunctuation-0.2dev.nvda-addon Note that it requires NVDA 2019.3 alpha.
  2. Speak the following phrase: Test test!!!

Expected behavior: Ding-ding-ding sounds should start playing after "test test" Actual behavior: ding-ding sounds play at the same time "test test" utterance is spoken.

Phonetic punctuation converts this utterance into:

["Test test", <CallBackCommand that actually plays those ding sounds>, <BreakCommand for the duration of the sound>]

SO I suspect that your driver triggers callback sooner then the utterance has been fully spoken.

With other synthesizers phonetic punctuation works correctly; I tested with espeak, SAPI, OneCore and the other version of eloquence.

There is also another small problem, it seems like the duration of the pause in break command must be multiplied by some coefficient, that seem to be equal to 3. If you try to speak this phrase with phonetic punctuation: !!!!!!!!Test You will hear the word "test" much sooner than the dings end, because of that problem.

Neurrone commented 4 years ago

@mltony which other version of Eloquence were you testing with? CodeFactory's?

davidacm commented 4 years ago

Thanks. Its urgent to solve but I can't at this time because I'm outside of my country. I'll fix it when get back to my country. If someone can fix it, I can review the pull request and accept it.

mltony commented 4 years ago

I was testing with this one: https://github.com/pumper42nickel/eloquence_threshold/ With this one phonetic punctuation works fine.

davidacm commented 4 years ago

Hi, I need your collaboration. Let me know if you can find a solution for this issue, please read my entire long comment.

The situation:

  1. The other driver that you mentioned has the same cracking issue, I don't want to introduce another issue to solve this.
  2. The issue: IBMTTS driver use a stream to buffer certain quantity of audio. When that buffer is full, the audio is send to the NVDA's player. All indexes received are sent also. By this way, we avoid voice breakage but the index accuracy is lost. On the IBMTTS driver the indexes are sent early, I could change this behavior but then the indexes will be sent delayed due to the audio bufferr.
  3. The solution I tried: send the audio stream when the buffer is full or when an index is received.

results:

The issue of point 2 appeared for many sentences in spanish language, the cases are different for each language. I tried with some english cases for you.

steps to reproduce:

I don't know if this issue depends on hardware specs, maybe on your computer you need to adjust it to distinct parameters. but here are my main computer specs:

environment:
Steps:

The breaks happen at the end of a string with specific speeds and sentences. I can mention many cases in spanish (my language) but in english you need to find them. Although here are some that I found in 5 mins using american english language.

  1. Set Eloquence driver to american english language.
  2. Adjust the eloquence driver at the specified speed. You can test it with IBMTTS also to test that the issue is not present in the second driver.
  3. Read the following sentences.
rate at 0%:

rate 0

at 10%:

rate 10

at 15%:

this is the number 20

at 20%:

eco comma number papa alpha

at 30%:

colon 50

at 50%:

rate 50 rate 0

Mohamed00 commented 3 years ago

I believe I may have found a solution to this on the eloquence_threshold side of things. Adding buffered=True to the nvwave.WavePlayer constructor, and setting nvwave.WavePlayer.MIN_BUFFER_MS to at least 900 seems to fix the issue.

davidacm commented 3 years ago

Hi, has this issue been fixed? I can't find the code fix, but I tried the proposed solution and it introduced another issue for me. Sometimes the synth has a lag of some MS if I use buffered=True.

Mohamed00 commented 3 years ago

Pretty much. I got a report on another repository that the solution I used worked pretty well for someone. For a time I was considering making an accurate indexing option, noting that it could cause crackles, but I'm not sure how practical that would be.

On 5/9/2021 2:28 PM, David CM wrote:

Hi, has this issue been fixed? I can't find the code fix, but I tried the proposed solution and it introduced another issue for me. Sometimes the synth has a lag of some MS if I use buffered=True.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/davidacm/NVDA-IBMTTS-Driver/issues/22#issuecomment-835858926, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY4AYEQWZTWPT3KPDO55YTTM3H4NANCNFSM4I5Z2O6A.

ultrasound1372 commented 2 years ago

I saw a commit up a ways that apparently did something related to this, do we still have this issue? Have we examined how other synthesizers handle accurate indexing without crackling like that? Could it have to do with the fact that NVWave also has to do some internal resampling as the audio is sent to the output? Eloquence runs at 11025Hz, while most contemporary synthesizers run at 22050Hz. Some 16000. eSpeak might actually run higher. If your system samplerate is set to 44100 upsampling from 11025 is easy, as integer ratios always are. Just some brief interpolation. But if your system is set to 48000 perhaps it has to do more work? Or does it pass that off to Windows? Have we looked at the DECtalk access32 drivers to see if they have accurate indexing, and if they do, what settings do they use? They are another known set of synths that run at 11025.

ultrasound1372 commented 1 year ago

May be worth revisiting this discussion in relation to the NVDA alphas that add WASAPI support and the accompanying refactor of NVWave, as this might mitigate the crackling altogether. We can then either choose to install support for both the current method and a new, more accurate method depending on NVDA's version, or make a release of that add-on after that version is put out as an RC that has it as a minimum. @davidacm What do you think?