Closed aalsabag closed 2 years ago
Hi @aalsabag,
I guess the first question is more about the latency issue which is already filed. I believe it depends on the model you use + for some languages there's an old architecture used, which requires retraining. I believe @nshmyrev can give you more details on that or you can read his blog post on that.
rec.AcceptWaveform(data)
gives you a final transcribe and there's a big window between the last interim result and a final transcribe. So probably only changing a model or custom interim results' analysis may help here.
For the second point: what length does your audio file have? And why do you need writing captions to file and then immediately read them back? Wouldn't it be simpler to save captions in the background (if you really need to persist changes) and stream results to your service on-fly? I can't get the real use-case yet.
Thank you. I really appreciate your response.
To clarify my second question: -I have a service live streaming to an audio file.
-I want to generate subtitles based on that growing audio file.
-I want to send every few words to another completely separate service.
With regards to the first question, what does "custom interim results" mean? Currently the tool just waits for a pause in speech I think. Sometimes that can be a very long sentence. Are you saying acceptWaveform is dependent entirely on the model I use. If so, I'll play around with other models.
In this case, you may want to take rec.PartialResult()
in the last else
block and immediately send it to your service. You can implement it the way it'd seem like a continuous speech refinement for the end-user. In real life when you start pronouncing something, the initial sound might be interpreted as a wide range of words by our ears until we finish. That's what I call interim (or partial) results (from Google STT vocabulary): something probably really close to what you say, but not 100% accurate yet. You can send partial non-empty results to your service and implement this "refinement" feature on the client-side. It'll give a feeling that someone on the other side tries to understand you in real-time. And that's how you can eliminate those delays between the last partial result and a final transcribe. Technically, it won't affect the real response time, but the fact that you immediately display something to the user gives an illusion that there's no delay at all.
Yeah that was what I was going to do. Justed wanted to confirm. Thank you for your help.this ticket may be closed
Hello,
rec.AcceptWaveform(data)
returnsTrue
more frequently?