Open alechirsch opened 1 year ago
In model.conf you add
--endpoint.rule5.min-utterance-length=100
it will be 100 seconds instead of 20.
In general you are not really interested in very long utterances. It should stop earlier due to pause.
Is there a way to do this with the docker container?
On Fri, Oct 13, 2023, 7:13 PM Nickolay V. Shmyrev @.***> wrote:
In model.conf you add
--endpoint.rule5.min-utterance-length=100
it will be 100 seconds instead of 20.
In general you are not really interested in very long utterances. It should stop earlier due to pause.
— Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-server/issues/240#issuecomment-1762460164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUMB5Y4AKOMFXWOTZ4CSJDX7HRNBAVCNFSM6AAAAAA57XBDLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRSGQ3DAMJWGQ . You are receiving this because you authored the thread.Message ID: @.***>
If there is a cleaner way do this without using a volume, please let me know
docker run -it alphacep/kaldi-en /bin/bash -c "echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf && python3 ./asr_server.py /opt/vosk-model-en/model"
Using a custom Dockerfile seems cleaner to me, something like this :
FROM alphacep/kaldi-en
RUN echo '--endpoint.rule5.min-utterance-length=100' >> /opt/vosk-model-en/model/conf/model.conf
CMD [ "python3", "./asr_server.py", "/opt/vosk-model-fr/model" ]
I am using the websocket server docker image for the english model. I am feeding it a live stream of converted (to wav) audio for telephony purposes. I have noticed that the websocket returns parsed text in no more than 20 second chunks of speech. This is causing issues where the transcription can get cut off in the middle of a word around the 20 second mark per chunk. Is this a known limitation? Is there any way to increase the time of each finalized text chunk?