alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Websocket closed when decode long audio files #225

Closed kli017 closed 1 year ago

kli017 commented 4 years ago

Hello, I was doing decode with the kaldinnet2onlinedecoder. When I test some long audio such as the bill_gates-TED.mp3 or other audio longer than 1 min, Its always show "Audio sent, now sending EOS" and then the websocket closed. But when I checked the worker I found the decoder is still working just without sending the result to the client. I was wondering is there a limitation of the length of audio? How can I decode a long audio? log

kamoo1 commented 3 years ago

I had this similar issue, I was sending a long audio file matching rate of real time, however the result comes about 1/3 of real-time speed. After some debugging, I found out that server closes the socket after 40 seconds not receiving results from the decoder, and the decoder had already transcribed the entire audio in real time. The reason for the delay was caused by the sleep function in the post-processor.

So in my case removing sleep in the post-processor fixes the issue, but I'm still not sure why it's in some of the example settings. Maybe @alumae can shed some light on this? Much appreciate!

kamoo1 commented 3 years ago

I had this similar issue, I was sending a long audio file matching rate of real time, however the result comes about 1/3 of real-time speed. After some debugging, I found out that server closes the socket after 40 seconds not receiving results from the decoder, and the decoder had already transcribed the entire audio in real time. The reason for the delay was caused by the sleep function in the post-processor.

So in my case removing sleep in the post-processor fixes the issue, but I'm still not sure why it's in some of the example settings. Maybe @alumae can shed some light on this? Much appreciate!

In the post_process method of worker.py, acquireing self.post_proessor_lock with timeout of 0.0 somehow blocked without raising TimeoutError, this jammed the pipeline.

kamoo1 commented 3 years ago

I had this similar issue, I was sending a long audio file matching rate of real time, however the result comes about 1/3 of real-time speed. After some debugging, I found out that server closes the socket after 40 seconds not receiving results from the decoder, and the decoder had already transcribed the entire audio in real time. The reason for the delay was caused by the sleep function in the post-processor. So in my case removing sleep in the post-processor fixes the issue, but I'm still not sure why it's in some of the example settings. Maybe @alumae can shed some light on this? Much appreciate!

In the post_process method of worker.py, acquireing self.post_proessor_lock with timeout of 0.0 somehow blocked without raising TimeoutError, this jammed the pipeline.

Acquiring tornado.locks.Lock with timeout=0.0 seems to block. https://github.com/tornadoweb/tornado/blob/v4.5.3/tornado/locks.py#L389