NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.03k stars 2.51k forks source link

speaker_reco_infer.py - pytorch version issue? #2842

Closed briebe closed 2 years ago

briebe commented 3 years ago

Describe the bug

speaker_reco_infer.py loads the model and manifestfiles and then breaks, I guess its again a pytorch issue? wanted to use the model from yesterday:

[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+ Please update PyTorch to remain compatible with later versions of NeMo.

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Steps/Code to reproduce bug

Container: nvcr.io/nvidia/nemo:1.2.0

but installed: python -m pip install git+https://github.com/NVIDIA/NeMo.git@'main' python -m pip install pytorch_lightning==1.4.2

(like used/working in the goolge colab, I also tried nemo 1.2 and pytorch-lightning 1.3.8 and nemo 1.3 and recent 1.4.7 later on)

run: https://github.com/NVIDIA/NeMo/blob/48fe9e69feba7651694fd6ae0a096a0655ed601c/examples/speaker_tasks/recognition/speaker_reco_infer.py

with: model, train.json from:

https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb

added: test.json and bonian.wav {"audio_filepath": "bonian.wav", "offset": 0, "duration": 11.370666666666667, "label": ""}

Expected behavior

working :-)

Environment overview (please complete the following information)

Environment details

If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:

Additional context https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/results.html

A little bit more explanation to the inference part would be nice, like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook

nithinraok commented 3 years ago

Can you add some text to label ? like "label": "UNK"

nithinraok commented 2 years ago

closing due to inactivity. Feel free to reopen if you experience the issue again