NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

How to use NeMo ASR Infer mode to transcribe a local audio file #1057

Closed viveksj closed 3 years ago

viveksj commented 4 years ago

I tried looking at all documentation and found examples for training/testing, combining it with other post asr processors.

1_ASR_tutorial_using_NeMo.ipynb has a block on an inference mode, but, I'm having a bit of diffculty understanding it.

What, I want to do is have the ability to export the model, import it in a script, pass an audio file (and maybe a real time audio stream in the future) and have it spit out transcriptions and whatever other metadata it has.

I'd really appreciate if anyone can point to examples/resources that can help me with it.

soheiltehranipour commented 4 years ago

My Issue too.

viveksj commented 4 years ago

I know there is a train_manifest and a test_manifest option in: https://github.com/NVIDIA/NeMo/blob/master/examples/asr/notebooks/1_ASR_tutorial_using_NeMo.ipynb It needs a json structure with text in it. If removed and plugged in the same logic, it throws an error: ValueError: Manifest file ./an4/test.json has invalid json line structure: {"audio_filepath": "./an4/wav/test/5023415387092_rms_trim.wav", "duration": 27.0} without proper text key.

I also saw this earlier: []https://github.com/NVIDIA/NeMo/blob/v0.11.0/examples/asr/speech2text_infer.py

And according to that there is an AudioToTextDataLayer which should allow us to do what we need, but I couldn't figure out how to use/instantiate it.

darraghdog commented 4 years ago

This works for me... on the main branch.

import nemo.collections.asr as nemo_asr
wave_file = ['/Users/dhanley/Downloads/male.wav']
quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En")
quartznet.transcribe(paths2audio_files=wave_file)

If you wish to infer off a streamed audio, you need to take the code from EncDecCTCModel class into your script and make some local changes to not load from a file; but instead take a torch tensor as input... so you can use Soundfile to load the stream to a numpy array, then to torch tensor, then plug that directly here instead of loading the file.

Maybe the team has a better approach for that.

okuchaiev commented 4 years ago

Yes, code snippet by @darraghdog is a good way to transcribe audio files in batch mode from disk.

For streaming example (e.g. from microphone) please refer to this notebook. Note that it can't be run on Colab because it streams audio from microphone. Also, it is currently under review in PR 1062, will be soon merged to main branch

viveksj commented 4 years ago

This works for me... on the main branch.

import nemo.collections.asr as nemo_asr
wave_file = ['/Users/dhanley/Downloads/male.wav']
quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En")
quartznet.transcribe(paths2audio_files=wave_file)

If you wish to infer off a streamed audio, you need to take the code from EncDecCTCModel class into your script and make some local changes to not load from a file; but instead take a torch tensor as input... so you can use Soundfile to load the stream to a numpy array, then to torch tensor, then plug that directly here instead of loading the file.

Maybe the team has a better approach for that.

Thank you. I'm giving this a shot. Will post an update soon.

viveksj commented 4 years ago

On trying to run quartznet = n.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En"), I got

AttributeError: module 'nemo.collections.asr.models' has no attribute 'EncDecCTCModel'

I am now trying to figure out where I can find EncDecCTCModel and then add it to the right location.

I finished all the steps for https://github.com/NVIDIA/NeMo/blob/master/examples/asr/notebooks/1_ASR_tutorial_using_NeMo.ipynb with custom manifest files. (.json, .transcription files appended with audio filepaths,transcription data, duration in the same format, for similar files)

From another issue, I realize that there has been an update and these new files could be a part of the update. I have cloned a version of the new NeMo and I'm trying to configure this and will then try to see how to use the model that has been trained currently with the new version.

okuchaiev commented 4 years ago

@viveksj looks like you are using "master" branch, please switch to the "main" branch, master is an old version.

viveksj commented 4 years ago

Oh, yes, I was using Master.

Switching to main now and trying again.

Thanks @okuchaiev.

viveksj commented 4 years ago

Forgive me if some of my questions seem trivial,

I already had a model under training with the tutorial from the master branch. This is where I am with that currently:

[NeMo I 2020-09-10 21:10:30 callbacks:220] Step: 99400 [NeMo I 2020-09-10 21:10:30 helpers:72] Loss: 206.056396484375 [NeMo I 2020-09-10 21:10:30 helpers:73] training_batch_WER: 47.13% [NeMo I 2020-09-10 21:10:30 helpers:74] Prediction: aout a tw thousand and twelve [NeMo I 2020-09-10 21:10:30 helpers:75] Reference: about a two thousand and twelve [NeMo I 2020-09-10 21:10:30 callbacks:235] Step time: 1.7500464916229248 seconds [NeMo I 2020-09-10 21:11:22 callbacks:220] Step: 99425 [NeMo I 2020-09-10 21:11:22 helpers:72] Loss: 200.1127166748047 [NeMo I 2020-09-10 21:11:22 helpers:73] training_batch_WER: 49.56% [NeMo I 2020-09-10 21:11:22 helpers:74] Prediction: an dr a i was hoping nix come in on thursday [NeMo I 2020-09-10 21:11:22 helpers:75] Reference: enjoy at i was hoping to come in on thursday [NeMo I 2020-09-10 21:11:22 callbacks:235] Step time: 1.730557918548584 seconds [NeMo I 2020-09-10 21:12:15 callbacks:220] Step: 99450 [NeMo I 2020-09-10 21:12:15 helpers:72] Loss: 182.1306610107422 [NeMo I 2020-09-10 21:12:15 helpers:73] training_batch_WER: 45.37% [NeMo I 2020-09-10 21:12:15 helpers:74] Prediction: a what just tright number yes yes seven two four five one [NeMo I 2020-09-10 21:12:15 helpers:75] Reference: so what i just the right number yes yes seven two four five one [NeMo I 2020-09-10 21:12:15 callbacks:235] Step time: 1.7343809604644775 seconds [NeMo I 2020-09-10 21:13:07 callbacks:220] Step: 99475 [NeMo I 2020-09-10 21:13:07 helpers:72] Loss: 186.873291015625 [NeMo I 2020-09-10 21:13:07 helpers:73] training_batch_WER: 45.04% [NeMo I 2020-09-10 21:13:07 helpers:74] Prediction: thank you hihlinda i hi dont know if f need [NeMo I 2020-09-10 21:13:07 helpers:75] Reference: thank you have you hi linda i i dont know if i need [NeMo I 2020-09-10 21:13:07 callbacks:235] Step time: 1.7287425994873047 seconds [NeMo I 2020-09-10 21:14:00 callbacks:220] Step: 99500 [NeMo I 2020-09-10 21:14:00 helpers:72] Loss: 218.74710083007812 [NeMo I 2020-09-10 21:14:00 helpers:73] training_batch_WER: 47.83% [NeMo I 2020-09-10 21:14:00 helpers:74] Prediction: busiiss information maybe outdateed orring correct if yoou [NeMo I 2020-09-10 21:14:00 helpers:75] Reference: business information may be outdated or incorrect if you [NeMo I 2020-09-10 21:14:00 callbacks:235] Step time: 1.8022284507751465 seconds [NeMo I 2020-09-10 21:14:00 callbacks:440] Doing Evaluation .............................. [NeMo I 2020-09-10 21:14:45 helpers:185] ==========>>>>>>Evaluation Loss: 183.29974365234375 [NeMo I 2020-09-10 21:14:45 helpers:186] ==========>>>>>>Evaluation WER: 46.42% [NeMo I 2020-09-10 21:14:45 callbacks:445] Evaluation time: 45.15841555595398 seconds

The sub-directory an4_checkpoints within my working directory seems to be the one where model training progress seems to be stored. The files in checkpoints are:

JasperDecoderForCTC-STEP-1000.pt JasperDecoderForCTC-STEP-5000.pt JasperDecoderForCTC-STEP-98000.pt JasperEncoder-STEP-19000.pt JasperEncoder-STEP-96000.pt trainer-STEP-17000.pt trainer-STEP-6000.pt trainer-STEP-99000.pt JasperDecoderForCTC-STEP-17000.pt JasperDecoderForCTC-STEP-6000.pt JasperDecoderForCTC-STEP-99000.pt JasperEncoder-STEP-4000.pt JasperEncoder-STEP-97000.pt trainer-STEP-18000.pt trainer-STEP-6100.pt JasperDecoderForCTC-STEP-18000.pt JasperDecoderForCTC-STEP-6100.pt JasperEncoder-STEP-1000.pt JasperEncoder-STEP-5000.pt JasperEncoder-STEP-98000.pt trainer-STEP-19000.pt trainer-STEP-96000.pt JasperDecoderForCTC-STEP-19000.pt JasperDecoderForCTC-STEP-96000.pt JasperEncoder-STEP-17000.pt JasperEncoder-STEP-6000.pt JasperEncoder-STEP-99000.pt trainer-STEP-4000.pt trainer-STEP-97000.pt JasperDecoderForCTC-STEP-4000.pt JasperDecoderForCTC-STEP-97000.pt JasperEncoder-STEP-18000.pt JasperEncoder-STEP-6100.pt trainer-STEP-1000.pt trainer-STEP-5000.pt trainer-STEP-98000.pt

I know that _quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(modelname="QuartzNet15x5Base-En") will download pre-trained QuartzNet15x5 model from NVIDIA's NGC cloud and instantiate it

Is there a really quick way to import the model being trained within ^ code snippet, so that I dont have to start things from scratch with the main branch tutorial, given that my model has been going under training for a couple of days now: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/01_ASR_with_NeMo.ipynb

nithinraok commented 4 years ago

Related to this issue, email conversation and comment https://github.com/NVIDIA/NeMo/issues/1091#issuecomment-692730768

Looks like you are using a model trained with NeMo v0.11(master) branch and have issues probably because you are:

  1. using nemo 1.0 beta (main branch) tutorial snippets with master (v 0.11) branch code
  2. using checkpoints that were trained on NeMo v0.11

Solution: if you have models trained with v0.11 and want to use latest main branch then use script asr_checkpoint_port.py to convert your old checkpoints to .nemo file,

then Refer to asr tutorials in main branch for examples

sammiyo commented 4 years ago

@viveksj looks like you are using "master" branch, please switch to the "main" branch, master is an old version.

Hello. I have been getting the same error nemo.collections.asr.models' has no attribute 'EncDecCTCModel' while following the tutorial in the main branch: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/01_ASR_with_NeMo.ipynb please help if you were able to resolve the issue

viveksj commented 4 years ago

@viveksj looks like you are using "master" branch, please switch to the "main" branch, master is an old version.

Hello. I have been getting the same error nemo.collections.asr.models' has no attribute 'EncDecCTCModel' while following the tutorial in the main branch: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/01_ASR_with_NeMo.ipynb please help if you were able to resolve the issue

I created a new environment (optional), cloned the latest version of NeMo (which does have the EncDecCTCModel) matched version of Torch with Cuda and it worked. I had some issues with I think the system looking for EncDecCTCModel in the pip install version of nemo rather than the one that exists in the NeMo folder; which is why creating a new virtual environment and installing needed dependencies worked for me. (They also have a docker container, which I was able to pull, but could never figure out how to use it)

viveksj commented 4 years ago

Related to this issue, email conversation and comment #1091 (comment)

Looks like you are using a model trained with NeMo v0.11(master) branch and have issues probably because you are:

  1. using nemo 1.0 beta (main branch) tutorial snippets with master (v 0.11) branch code
  2. using checkpoints that were trained on NeMo v0.11

Solution: if you have models trained with v0.11 and want to use latest main branch then use script asr_checkpoint_port.py to convert your old checkpoints to .nemo file,

then Refer to asr tutorials in main branch for examples

When I run python scripts/asr_checkpoint_port.py --config_path='/home/ubuntu/nemo_z/configs/jasper_an4.yaml' --encoder_ckpt='/home/ubuntu/nemo_z/an4_checkpoints/JasperEncoder-STEP-327000.pt' --decoder_ckpt='/home/ubuntu/nemo_z/an4_checkpoints/JasperDecoderForCTC-STEP-327000.pt' --output_path='/home/ubuntu/nemo_new/NeMo/model/'

I get the following error:

[NeMo I 2020-09-16 15:10:11 asr_checkpoint_port:53] Creating ASR NeMo 1.0 model Traceback (most recent call last): File "scripts/asr_checkpoint_port.py", line 72, in main(args.config_path, args.encoder_ckpt, args.decoder_ckpt, args.output_path, args.model_type) File "scripts/asr_checkpoint_port.py", line 54, in main model = nemo_asr.models.EncDecCTCModel(cfg=DictConfig(params['model'])) File "/home/ubuntu/nemo_new/venv/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 81, in init self._set_value(content) File "/home/ubuntu/nemo_new/venv/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 541, in _set_value raise ValidationError(msg=msg) # pragma: no cover omegaconf.errors.ValidationError

nithinraok commented 4 years ago

You do not have model variable in your yaml file. You should use omega conf type config, look at https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/config.yaml , if it doesn't match Jasper Encoder Decoder structure you need change accordingly by comparing with your jasper_an4.yaml file

viveksj commented 4 years ago

My yaml file is the same one as the initial example in 01_ASR_with_NeMo.ipynb: https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/jasper_an4.yaml

It does seem a bit different than the QuartzNet15x5 yaml file.