Open hieuhv94 opened 3 years ago
cc @vineelpratap
Any suggestion for this??
Hey @hieuhv94 , sorry I didn't get the question.
For running inference with CUDA, you can use fl_asr_decode
binary which will do beam search decoding with a LM.
If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.
Hey @hieuhv94 , sorry I didn't get the question.
For running inference with CUDA, you can use
fl_asr_decode
binary which will do beam search decoding with a LM.If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.
Thank for your comment @vineelpratap Do you have any ideal for streaming inferences with CUDA? And performance of streaming inference with cuda is better than cpu or not? Because, we must copy multi chunks to vram in streaming progress.
I run inference streaming code in CUDA environment (flashlight and wav2letter with CUDA) but results was the same between cpu and cuda. And i have a question, how to run inference with CUDA? Thanks!