Closed choclatier closed 2 years ago
Reproducable with: Google Colab File : https://colab.research.google.com/drive/1Je0_9D1iWB2C_BGOg-r_kqiX4owuZuro?usp=sharing
Hello @carmocca @awaelchli @rohitgr7 , I realize this is probably a pytorch lightning problem. If you could be kind enough to suggest a fix, I could possibly help fix it. https://github.com/PyTorchLightning/pytorch-lightning/blob/4a4a27db05fe977af0173d00a86f4da230a9e4eb/pytorch_lightning/plugins/precision/deepspeed_precision.py
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
🐛 Bug
An error is thrown when trying to fit model
facebook/wav2vec2-large-robust-ft-swbd-300h
using the DeepSpeedPlugin running stage 3.Colab test file
Google Colab File : https://colab.research.google.com/drive/1Je0_9D1iWB2C_BGOg-r_kqiX4owuZuro?usp=sharing
To Reproduce
Steps to reproduce the behavior:
Error Traceback
Expected behavior
It should start training.
Environment
conda
,pip
, source):pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
Additional context
I able to successfully train the base model using fairseq, but I'm trying to train the robust model with
stage=3
deepspeed, any help would be appreciated.