farisalasmary / deepspeech2-online-decoder

Online (real-time) decoder to be used with DeepSpeech2 model
MIT License
24 stars 4 forks source link

incompatability with latest custom trained models. #5

Open naveenss1995 opened 4 years ago

naveenss1995 commented 4 years ago

The code is working with the latest pretrained models , but when you want to run on a custom trained model it errors out.

INFO:root:Setting up server... Loading the LM will be faster if you build a binary file. Reading /home/ubuntu/ds/ds.pytorch/deepspeech.pytorch/models/3-gram.pruned.3e-7.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100


Traceback (most recent call last): File "decoder_server.py", line 91, in main() File "decoder_server.py", line 78, in main spect_parser = OnlineSpectrogramParser(model.audio_conf, normalize=True) File "/home/ubunutu/ds/live.ds.pytorch.v2/deepspeech.pytorch/data/extended_data_loader.py", line 185, in init self.window_stride = audio_conf['window_stride'] TypeError: 'SpectConfig' object is not subscriptable

Issue seems to be that they have moved audio config from dict to SpectConfig class and for backward compatibility they have added the following in model.py

    if OmegaConf.get_type(package['audio_conf']) == dict:
        audio_conf = package['audio_conf']
        package['audio_conf'] = SpectConfig(sample_rate=audio_conf['sample_rate'],
                                            window_size=audio_conf['window_size'],
                                            window=SpectrogramWindow(audio_conf['window']))

Here as extended_data_loader.py is using audio_conf as a dictionary.

So I was thinking whether i should patch it or not .

What is the purpose of OnlineSpectrogramParser, why i cant i simply use SpectrogramParser, is it for speed ( reducing latency) , OnlineSpectrogramParser was used. If so can you explain what changes you made so that i can patch it.

farisalasmary commented 4 years ago

This project is built on top of the old version that does not support Hydra. Regrading your question about the "OnlineSpectrogramParser", it is exactly the same as "SpectrogramParser" with a small change. The change was made since I needed to decode the audio that is sent from the browser as a text. You can ready the "parse_audio" method in both "OnlineSpectrogramParser" and "SpectrogramParser" and you will see the difference.

This code was quick-and-dirty since I wanted to proof that it is possible to use online decoding using Deepspeech2. The code needs a lot of refactoring and I may do it soon.

For the time being, you can use this fork of the original deepspeech2 implementation https://github.com/farisalasmary/deepspeech.pytorch

farisalasmary commented 4 years ago

@naveenss1995 I think this model will work with the deepspeech2 version mentioned above https://github.com/SeanNaren/deepspeech.pytorch/releases/download/v2.0/librispeech_pretrained_v2.pth

naveenss1995 commented 4 years ago

Yes the above comment helps, Yes the model mentioned above also works, issue arises when you do tranfer learning on top of that model.

farisalasmary commented 4 years ago

Can you share the error message?

naveenss1995 commented 4 years ago

Its resolved modified audio parsing method in latest SpectrogramParser to use load_audio_from_txt, issue was resolved.

naveenss1995 commented 4 years ago

The fix worked 99 % of the time but unfortunatey its giving File "/home/ubuntu/ds/live.ds.pytorch.v2/deepspeech.pytorch/deepspeech_pytorch/loader/data_loader.py", line 19, in load_audio_from_txt sound = sound.astype('float32') / 32767 # normalize audio ValueError: could not convert string to float: ''

farisalasmary commented 4 years ago

Its resolved modified audio parsing method in latest SpectrogramParser to use load_audio_from_txt, issue was resolved.

What do you mean by the "latest" SpectrogramParser?

farisalasmary commented 4 years ago

The fix worked 99 % of the time but unfortunatey its giving File "/home/ubuntu/ds/live.ds.pytorch.v2/deepspeech.pytorch/deepspeech_pytorch/loader/data_loader.py", line 19, in load_audio_from_txt sound = sound.astype('float32') / 32767 # normalize audio ValueError: could not convert string to float: ''

It seems that there is no data sent from the browser and hence you get an empty string which causes the error above.

naveenss1995 commented 3 years ago

By latest SpectroGram Parser i mean the latest code in SeanNaren/deepspeech.pytorch. Yes as you mentioned above the error happens when i am stopping the recording. If this is expected behaviour (server sending empty strings when recording is stopped or paused ) do i need to put an exeption in place ?