Text output in 20 second chunks

alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

Apache License 2.0

935 stars 249 forks source link

Text output in 20 second chunks #240

Open alechirsch opened 1 year ago

alechirsch commented 1 year ago

I am using the websocket server docker image for the english model. I am feeding it a live stream of converted (to wav) audio for telephony purposes. I have noticed that the websocket returns parsed text in no more than 20 second chunks of speech. This is causing issues where the transcription can get cut off in the middle of a word around the 20 second mark per chunk. Is this a known limitation? Is there any way to increase the time of each finalized text chunk?

nshmyrev commented 1 year ago

In model.conf you add

--endpoint.rule5.min-utterance-length=100

it will be 100 seconds instead of 20.

In general you are not really interested in very long utterances. It should stop earlier due to pause.

alechirsch commented 1 year ago

Is there a way to do this with the docker container?

On Fri, Oct 13, 2023, 7:13 PM Nickolay V. Shmyrev @.***> wrote:

In model.conf you add

--endpoint.rule5.min-utterance-length=100

it will be 100 seconds instead of 20.

In general you are not really interested in very long utterances. It should stop earlier due to pause.

— Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-server/issues/240#issuecomment-1762460164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUMB5Y4AKOMFXWOTZ4CSJDX7HRNBAVCNFSM6AAAAAA57XBDLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRSGQ3DAMJWGQ . You are receiving this because you authored the thread.Message ID: @.***>

alechirsch commented 1 year ago

If there is a cleaner way do this without using a volume, please let me know

docker run -it alphacep/kaldi-en /bin/bash -c "echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf && python3 ./asr_server.py /opt/vosk-model-en/model"

GuillaumeV-cemea commented 1 year ago

Using a custom Dockerfile seems cleaner to me, something like this :

FROM alphacep/kaldi-en
RUN echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf
CMD [ "python3", "./asr_server.py", "/opt/vosk-model-fr/model" ]