Closed briebe closed 3 years ago
I cant seem to reproduce the issue, its working good on colab. Could you rerun? Updated link: https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
So for you to be sure i didnt miss anything, i used "Run all" (cells) Training seems to have worked and final checkpoint could be loaded, (but):
trainer.fit(speaker_model)
[NeMo I 2021-09-16 06:26:58 label_models:240] val_loss: 32.002
Epoch 4, global step 83: val_loss was not in top 3
it now runs without problems until:
"Restoring from a PyTorch Lightning checkpoint
To restore a model using the LightningModule.load_from_checkpoint() class method."
restored_model = nemo_asr.models.EncDecSpeakerLabelModel.load_from_checkpoint(final_checkpoint)
TypeError Traceback (most recent call last)
This looks to me issue with the latest pytorch lightning. Can you manually run !pip install pytorch_lightning==1.4.2 before the cell where it throws error. Also there was an import fix provided with https://github.com/NVIDIA/NeMo/pull/2821
this fix brings us to cell/code:
manifest_filepath = os.path.join(NEMO_ROOT,'embeddings_manifest.json') device = 'cuda' if torch.cuda.is_available() else 'cpu' get_embeddings(verification_model, manifest_filepath, batch_size=64,embedding_dir='./', device=device)
[NeMo I 2021-09-16 07:11:06 audio_to_label:445] Time length considered for collate func is 20 [NeMo I 2021-09-16 07:11:06 audio_to_label:446] Shift length considered for collate func is 0.75 [NeMo I 2021-09-16 07:11:06 collections:267] Filtered duration for loading collection is 0.000000. [NeMo I 2021-09-16 07:11:06 collections:270] # 5 files loaded accounting to # 5 labels [NeMo I 2021-09-16 07:11:06 label_models:126] Setting up identification parameters
NameError Traceback (most recent call last)
Please read my above comment, import fix for that is provided through PR https://github.com/NVIDIA/NeMo/pull/2821
ok, i got you. Used the changes you made there and now its running without problems! Great work! Added myself to the finetuning and will see about the results. :-) Related question: I was trying to use the "hi-mia" dataset yesterday, because the AN4 source is/was not very stable in the last week. This is the first line of my test.json:
{"audio_filepath": "../rivaclient/NeMo/scripts/dataset_processing/data/dev/SPEECHDATA/wav/SV0280/SV0280_6_07_S3653.wav", "offset": 0, "duration": 1.488, "label": "SV0280"}
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
is this related to todays fix? will try later thanks!!!
Describe the bug
[NeMo W 2021-09-06 11:58:47 patch_utils:50] torch.stft() signature has been updated for PyTorch 1.7+ Please update PyTorch to remain compatible with later versions of NeMo.
and followed by
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value) 4157 assert len(pad) == 2, "3D tensors expect 2 values for padding" 4158 if mode == "reflect": -> 4159 return torch._C._nn.reflection_pad1d(input, pad) 4160 elif mode == "replicate": 4161 return torch._C._nn.replication_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 2]
also in this notebook, next to the AN4 Source not available problem:
Original Cell: restored_model.setup_finetune_model(config.model)
TypeError Traceback (most recent call last)