The training is interrupted when reaches 33% of the first epoch. I tried many times and always stopped at 33%. The graphic cards are still used 100% utils by python processes.
My config:
trainer.accelerator=ddp
traner.plugins=null
trainer.gradient_clip_val=400
trainer.gpus=3
trainer.amp_level=01
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The training is interrupted when reaches 33% of the first epoch. I tried many times and always stopped at 33%. The graphic cards are still used 100% utils by python processes. My config: trainer.accelerator=ddp traner.plugins=null trainer.gradient_clip_val=400 trainer.gpus=3 trainer.amp_level=01
I trained on 3 V100 GPUs