flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Inference with cuda #969

Open hieuhv94 opened 3 years ago

hieuhv94 commented 3 years ago

I run inference streaming code in CUDA environment (flashlight and wav2letter with CUDA) but results was the same between cpu and cuda. And i have a question, how to run inference with CUDA? Thanks!

tlikhomanenko commented 3 years ago

cc @vineelpratap

hieuhv94 commented 3 years ago

Any suggestion for this??

vineelpratap commented 3 years ago

Hey @hieuhv94 , sorry I didn't get the question.

For running inference with CUDA, you can use fl_asr_decode binary which will do beam search decoding with a LM.

If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.

hieuhv94 commented 3 years ago

Hey @hieuhv94 , sorry I didn't get the question.

For running inference with CUDA, you can use fl_asr_decode binary which will do beam search decoding with a LM.

If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.

Thank for your comment @vineelpratap Do you have any ideal for streaming inferences with CUDA? And performance of streaming inference with cuda is better than cpu or not? Because, we must copy multi chunks to vram in streaming progress.