NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Contrastive Loss: Runtime error for certain audio files when reshaping out_masked_only #4589

Closed piraka9011 closed 2 years ago

piraka9011 commented 2 years ago

Describe the bug

After a few steps when pretraining a SpeechEncDecSelfSupervisedModel, training fails with the following error

File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ssl_models.py", line 468, in training_step
  loss_value, loss_val_dict = self.decoder_loss_step(
File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/models/ssl_models.py", line 450, in decoder_loss_step
  current_loss_value = current_loss(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1129, in _call_impl
  return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/nemo/core/classes/common.py", line 963, in __call__
  outputs = wrapped(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/nemo/collections/asr/losses/ssl_losses/contrastive.py", line 187, in forward
  out_masked_only = out_masked_only.reshape(bs, -1, out_masked_only.shape[-1])
RuntimeError: shape '[16, -1, 128]' is invalid for input of size 547712

Steps/Code to reproduce bug

I used the speech_pre_training.py script with the default configuration for the conformer here.

I only modified train_ds.max_duration: 25.0. I also tried the config in the Self_Supervised_Pre_Training.ipynb notebook and I got the same error.

Expected behavior

Training should process normally and/or a more detailed explanation why the reshape failed and what parameters to change.

Environment overview (please complete the following information)

docker run --rm -it --gpus all --ipc=host --env-file .env train
piraka9011 commented 2 years ago

Btw, current workaround is to set loss_list.contrastive.loss.sample_from_same_utterance_only=False of course, but ideally this works with it set to true.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days since being marked as stale.