Simple Streaming ASR Example for Sota/2019 Model

flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

https://github.com/facebookresearch/wav2letter/wiki

Other

6.39k stars 1.01k forks source link

Simple Streaming ASR Example for Sota/2019 Model #841

Open tranmanhdat opened 4 years ago

tranmanhdat commented 4 years ago

Feature Description

A detail example about Running streaming ASR with sota/2019 model on custom trained model

Use Case

Current Streaming ASR Example requires acoustic_model.bin, feature_extractor.bin, tokens.txt which come from Streaming TDS model conversion. But that tools does not working on model come from sota/2019 trainning.

Additional Context

Error while try using Streaming TDS model conversion in picture below. Screenshot from 2020-09-28 23-00-03 Acoustic Model used : AM

danielkope commented 4 years ago

I think it would be great to have an example of how to leverage the SOTA examples with any audio output. A non-streaming example would already help to understand how to use the immense functionality of this framework better.

Charmelink commented 4 years ago

I am also trying to do this with no success. The same with the lexfree model. If anyone knows of examples using that model I would appreciate a pointer. Thanks

tlikhomanenko commented 4 years ago

@danielkope, @Charmelink could you describe more details on your use cases and what is not working/which guide you need? Happy to help and explain.

abhinavkulkarni commented 3 years ago

Hey @tlikhomanenko, I also asked the same question here wherein I was trying to convert 2019 SOTA TDS+CTC models into FBGEMM streaming convnets format. You may want to take a look (and kindly reply if possible).

It looks like these SOTA models trained on LibriSpeech and LibriSpeech+LibriVox have a higher number of parameters compared to TDS+CTC model from streaming convnets recipe, so it would be nice to get their FBGEMM streaming counterparts.

Thanks!

tlikhomanenko commented 3 years ago

Ok, I guess @vineelpratap knows more details on this.

But yep, for streaming we used smaller model due to final restrictions on the performance for online inference.