Training Crash (Warning: Moving average ignored a value of nan/inf && /pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [155,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.)

dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.

MIT License

5.01k stars 1.32k forks source link

Training Crash (Warning: Moving average ignored a value of nan/inf && /pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [155,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.) #608

Open kprastey opened 3 years ago

kprastey commented 3 years ago

Not able to train on a custom annotated dataset. The losses suddenly explode after a few epochs and training crashes. Please look into this error and help resolve this...

Environment info: Training on google colab with:

The dataset contains 1500 annotated images (1800x1600 each).

-Also let me know if you need any other information. @dbolya

code-wangshuyi commented 3 years ago

I have also got this training crash. Pytorch version is '1.8.0a0+1606899'. CUDA version is 11.2. In this envirament, I have successfully trained the model with resnet50-fpn as backbone . But when I use mobilev1-fpn as backbone, it crashed!

kprastey commented 3 years ago

@code-wangshuyi did you find out the reason for this crash?