ASR Inference from Tensor instead of file[Question]

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

Apache License 2.0

11.84k stars 2.46k forks source link

ASR Inference from Tensor instead of file[Question] #1486

Closed rbracco closed 3 years ago

rbracco commented 3 years ago

I would like to do inference on a tensor of audio samples instead of a file as I am inferring on audio passed via microphone and saving/loading adds extra latency. The only inference method I've been able to find for quartznet is transcribe which takes a list of files. Is there an easy way to infer on tensors? Or is my best bet to look at nemo.collections.asr.models.ctc_models.transcribe and replicate for tensors instead of files?

titu1994 commented 3 years ago

All NeMo models are pytorch modules, so it should be easy enough to look at their validation/test step code and replicate for tensors.

The easiest possible way might be to simply copy paste the validation step code and modify it use individual tensors instead of a batch from a data loader. Make sure to use no_grad()!

rbracco commented 3 years ago

Thank you, I can handle that! Closing the issue but I'll share the code if I implement it.