Open naveenss1995 opened 4 years ago
This project is built on top of the old version that does not support Hydra. Regrading your question about the "OnlineSpectrogramParser", it is exactly the same as "SpectrogramParser" with a small change. The change was made since I needed to decode the audio that is sent from the browser as a text. You can ready the "parse_audio" method in both "OnlineSpectrogramParser" and "SpectrogramParser" and you will see the difference.
This code was quick-and-dirty since I wanted to proof that it is possible to use online decoding using Deepspeech2. The code needs a lot of refactoring and I may do it soon.
For the time being, you can use this fork of the original deepspeech2 implementation https://github.com/farisalasmary/deepspeech.pytorch
@naveenss1995 I think this model will work with the deepspeech2 version mentioned above https://github.com/SeanNaren/deepspeech.pytorch/releases/download/v2.0/librispeech_pretrained_v2.pth
Yes the above comment helps, Yes the model mentioned above also works, issue arises when you do tranfer learning on top of that model.
Can you share the error message?
Its resolved modified audio parsing method in latest SpectrogramParser to use load_audio_from_txt, issue was resolved.
The fix worked 99 % of the time but unfortunatey its giving File "/home/ubuntu/ds/live.ds.pytorch.v2/deepspeech.pytorch/deepspeech_pytorch/loader/data_loader.py", line 19, in load_audio_from_txt sound = sound.astype('float32') / 32767 # normalize audio ValueError: could not convert string to float: ''
Its resolved modified audio parsing method in latest SpectrogramParser to use load_audio_from_txt, issue was resolved.
What do you mean by the "latest" SpectrogramParser?
The fix worked 99 % of the time but unfortunatey its giving File "/home/ubuntu/ds/live.ds.pytorch.v2/deepspeech.pytorch/deepspeech_pytorch/loader/data_loader.py", line 19, in load_audio_from_txt sound = sound.astype('float32') / 32767 # normalize audio ValueError: could not convert string to float: ''
It seems that there is no data sent from the browser and hence you get an empty string which causes the error above.
By latest SpectroGram Parser i mean the latest code in SeanNaren/deepspeech.pytorch. Yes as you mentioned above the error happens when i am stopping the recording. If this is expected behaviour (server sending empty strings when recording is stopped or paused ) do i need to put an exeption in place ?
The code is working with the latest pretrained models , but when you want to run on a custom trained model it errors out.
INFO:root:Setting up server... Loading the LM will be faster if you build a binary file. Reading /home/ubuntu/ds/ds.pytorch/deepspeech.pytorch/models/3-gram.pruned.3e-7.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Traceback (most recent call last): File "decoder_server.py", line 91, in
main()
File "decoder_server.py", line 78, in main
spect_parser = OnlineSpectrogramParser(model.audio_conf, normalize=True)
File "/home/ubunutu/ds/live.ds.pytorch.v2/deepspeech.pytorch/data/extended_data_loader.py", line 185, in init
self.window_stride = audio_conf['window_stride']
TypeError: 'SpectConfig' object is not subscriptable
Issue seems to be that they have moved audio config from dict to SpectConfig class and for backward compatibility they have added the following in model.py
Here as extended_data_loader.py is using audio_conf as a dictionary.
So I was thinking whether i should patch it or not .
What is the purpose of OnlineSpectrogramParser, why i cant i simply use SpectrogramParser, is it for speed ( reducing latency) , OnlineSpectrogramParser was used. If so can you explain what changes you made so that i can patch it.