alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 342 forks source link

issues while transcribing a long file (~19 minutes) #171

Open gilamsalem opened 5 years ago

gilamsalem commented 5 years ago

Hi, I am trying to transcribe a long file (~19 minutes) with a lot of silence in the middle of it. From the logs I see that at the beginning everything is going as expected:

2019-01-27 10:07:11.558 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Got message from server of type <class 'ws4py.messaging.BinaryMessage'>
2019-01-27 10:07:11.559 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer of size 4096 to pipeline
2019-01-27 10:07:11.559 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer done
2019-01-27 10:07:11.802 -    INFO:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Got partial result: <transcription>
2019-01-27 10:07:11.809 -    INFO:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Postprocessing (final=False) result..
2019-01-27 10:07:11.810 -    INFO:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Postprocessing done.

At some point(after ~ 13 minutes), I see that the time between the messages of "Pushing buffer of size..." and "Pushing buffer done" is increased. And then I see the following messages:

2019-01-27 10:20:26.652 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Got message from server of type <class 'ws4py.messaging.BinaryMessage'>
2019-01-27 10:20:26.737 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer of size 4096 to pipeline
2019-01-27 10:20:27.303 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer done
2019-01-27 10:20:27.687 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Checking that decoder hasn't been silent for more than 10 seconds
2019-01-27 10:20:36.788 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Got message from server of type <class 'ws4py.messaging.BinaryMessage'>
2019-01-27 10:20:37.139 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Checking that decoder hasn't been silent for more than 10 seconds
2019-01-27 10:20:42.442 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer of size 4096 to pipeline
2019-01-27 10:20:45.045 - WARNING:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: More than 10 seconds from last decoder hypothesis update, cancelling
2019-01-27 10:20:47.096 -    INFO:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Got partial result: <transcription>
2019-01-27 10:20:49.018 -    INFO:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Master disconnected before decoder reached EOS?
2019-01-27 10:20:48.242 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer done
2019-01-27 10:20:50.326 -    INFO:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Sending EOS to pipeline in order to cancel processing
2019-01-27 10:20:56.936 -    INFO:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Cancelled pipeline
2019-01-27 10:21:07.515 -    INFO:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Resetting decoder state
2019-01-27 10:21:05.886 -   DEBUG:   __main__: f17112ae-3776-497f-b20c-7161a7d84721: Got message from server of type <class 'ws4py.messaging.BinaryMessage'>
2019-01-27 10:21:16.233 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer of size 4096 to pipeline
2019-01-27 10:21:34.534 -   DEBUG:   decoder2: f17112ae-3776-497f-b20c-7161a7d84721: Pushing buffer done

Obviously from that point everything is going wrong. What can cause such a behavior? Can it be a memory issue?

gilamsalem commented 5 years ago

Update: memory issue with my kubernetes pod. fixed by increasing the memory.