Closed nikvaessen closed 3 years ago
For anyone who ends up here after googling, I managed to solve it by reading the discussion here: https://discuss.huggingface.co/t/wav2vec-fine-tuning-with-multigpu/4894/17
Basically, the huggingface library used gradient_checkpointing
and this needs to be disabled in order to use DDP. In the example code above this can be done by doing:
self.backbone.config.gradient_checkpointing = False
I'm not sure why my custom DDP worked but I might have made a mistake...
🐛 Bug
A custom model which builds on top of the Wav2Vec2 model from the transformers library cannot be trained with
Trainer(accelerator=ddp)
. However, the same model class with a custom DDP training loop functions properly.Using the pytorch lightning trainer results in the following traceback:
Reproduction
To Reproduce
You can run the following script (with
pytorch_lightning
,torch
andtransformers
as dependencies).Setting
USE_PYTORCH_LIGHTNING=True
will result in the error pasted above, whileUSE_PYTORCH_LIGHTNING=False
trains properly without any errors.Expected behavior
Using the pytorch-lightning
Trainer
class should not result in aRuntimeError
.Environment
Additional context