NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.03k stars 2.51k forks source link

torch.stft() signature has been updated for PyTorch 1.7+ Please update PyTorch to remain compatible with later versions of NeMo. #2780

Closed briebe closed 3 years ago

briebe commented 3 years ago

Describe the bug

[NeMo W 2021-09-06 11:58:47 patch_utils:50] torch.stft() signature has been updated for PyTorch 1.7+ Please update PyTorch to remain compatible with later versions of NeMo.

and followed by

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value) 4157 assert len(pad) == 2, "3D tensors expect 2 values for padding" 4158 if mode == "reflect": -> 4159 return torch._C._nn.reflection_pad1d(input, pad) 4160 elif mode == "replicate": 4161 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 2]

also in this notebook, next to the AN4 Source not available problem:

Original Cell: restored_model.setup_finetune_model(config.model)

TypeError Traceback (most recent call last)

in () ----> 1 restored_model.setup_finetune_model(config.model) if i change to Cell: restored_model.setup_finetune_model(model_config = config.model) TypeError: setup_finetune_model() missing 1 required positional argument: 'model_config' NameError Traceback (most recent call last) in () ----> 1 restored_model.setup_finetune_model(self, model_config=config.model) NameError: name 'self' is not defined same with this cell: restored_model.set_trainer(trainer_finetune) TypeError Traceback (most recent call last) in () ----> 1 restored_model.set_trainer(trainer_finetune) 2 log_dir_finetune = exp_manager(trainer_finetune, config.get("exp_manager", None)) 3 print(log_dir_finetune) TypeError: set_trainer() missing 1 required positional argument: 'trainer' **Steps/Code to reproduce bug** Cell: trainer.fit(speaker_model) in https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_recognition/Speaker_Recognition_Verification.ipynb **Expected behavior** (as expected by the ppl that made this notebook.... Colab training should work without bugfixing :-)) Torch 1.9 is installed, no updates possible as it seems... **Environment overview (please complete the following information)** torch @ https://download.pytorch.org/whl/cu102/torch-1.9.0%2Bcu102-cp37-cp37m-linux_x86_64.whl torch-stft==0.1.4 torchaudio==0.9.0 torchmetrics==0.5.1 torchsummary==1.5.1 torchtext==0.10.0 torchvision @ https://download.pytorch.org/whl/cu102/torchvision-0.10.0%2Bcu102-cp37-cp37m-linux_x86_64.whl **Environment details** GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs
nithinraok commented 3 years ago

I cant seem to reproduce the issue, its working good on colab. Could you rerun? Updated link: https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb

briebe commented 3 years ago

So for you to be sure i didnt miss anything, i used "Run all" (cells) Training seems to have worked and final checkpoint could be loaded, (but):

trainer.fit(speaker_model)

[NeMo I 2021-09-16 06:26:58 label_models:240] val_loss: 32.002

Epoch 4, global step 83: val_loss was not in top 3

it now runs without problems until:

"Restoring from a PyTorch Lightning checkpoint

To restore a model using the LightningModule.load_from_checkpoint() class method."

restored_model = nemo_asr.models.EncDecSpeakerLabelModel.load_from_checkpoint(final_checkpoint)


TypeError Traceback (most recent call last)

in () ----> 1 restored_model = nemo_asr.models.EncDecSpeakerLabelModel.load_from_checkpoint(final_checkpoint) 2 frames /usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/saving.py in _load_model_state(cls, checkpoint, strict, **cls_kwargs_new) 193 _cls_kwargs = {k: v for k, v in _cls_kwargs.items() if k in cls_init_args_name} 194 --> 195 model = cls(**_cls_kwargs) 196 197 # give model a chance to load something TypeError: __init__() missing 1 required positional argument: 'cfg'
nithinraok commented 3 years ago

This looks to me issue with the latest pytorch lightning. Can you manually run !pip install pytorch_lightning==1.4.2 before the cell where it throws error. Also there was an import fix provided with https://github.com/NVIDIA/NeMo/pull/2821

briebe commented 3 years ago

this fix brings us to cell/code:

manifest_filepath = os.path.join(NEMO_ROOT,'embeddings_manifest.json') device = 'cuda' if torch.cuda.is_available() else 'cpu' get_embeddings(verification_model, manifest_filepath, batch_size=64,embedding_dir='./', device=device)


[NeMo I 2021-09-16 07:11:06 audio_to_label:445] Time length considered for collate func is 20 [NeMo I 2021-09-16 07:11:06 audio_to_label:446] Shift length considered for collate func is 0.75 [NeMo I 2021-09-16 07:11:06 collections:267] Filtered duration for loading collection is 0.000000. [NeMo I 2021-09-16 07:11:06 collections:270] # 5 files loaded accounting to # 5 labels [NeMo I 2021-09-16 07:11:06 label_models:126] Setting up identification parameters


NameError Traceback (most recent call last)

in () 1 manifest_filepath = os.path.join(NEMO_ROOT,'embeddings_manifest.json') 2 device = 'cuda' if torch.cuda.is_available() else 'cpu' ----> 3 get_embeddings(verification_model, manifest_filepath, batch_size=64,embedding_dir='./', device=device) in get_embeddings(speaker_model, manifest_file, batch_size, embedding_dir, device) 18 out_embeddings = {} 19 ---> 20 for test_batch in tqdm(speaker_model.test_dataloader()): 21 test_batch = [x.to(device) for x in test_batch] 22 audio_signal, audio_signal_len, labels, slices = test_batch NameError: name 'tqdm' is not defined
nithinraok commented 3 years ago

Please read my above comment, import fix for that is provided through PR https://github.com/NVIDIA/NeMo/pull/2821

briebe commented 3 years ago

ok, i got you. Used the changes you made there and now its running without problems! Great work! Added myself to the finetuning and will see about the results. :-) Related question: I was trying to use the "hi-mia" dataset yesterday, because the AN4 source is/was not very stable in the last week. This is the first line of my test.json:

{"audio_filepath": "../rivaclient/NeMo/scripts/dataset_processing/data/dev/SPEECHDATA/wav/SV0280/SV0280_6_07_S3653.wav", "offset": 0, "duration": 1.488, "label": "SV0280"}

KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_label.py", line 364, in getitem t = torch.tensor(self.label2id[sample.label]).long() KeyError: 'SV0280'

is this related to todays fix? will try later thanks!!!