facebookresearch / adaptive_teacher

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".
Other
180 stars 35 forks source link

FloatingPointError: Predicted boxes or scores contain Inf/NaN. Training has diverged. #71

Closed ouyang11111 closed 10 months ago

ouyang11111 commented 11 months ago

foggyspace code can training ,but when i run VOC2clipar my instructions is :CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4 --config configs/faster_rcnn_R101_cross_clipart_b4.yaml OUTPUT_DIR output/exp_clipart_test many others solution is down the version of detectron2 ,but in my compute platform or CUDA are not support a lower version of detectron2. the train interupt in the beginning of a train (the first inter),with a very high loss any body meet same question or solve this problem ?

I have try this : down the loss_weigh such as BBOX_REG_LOSS_WEIGHT LOSS_WEIGHT: 0.01 BBOX_REG_LOSS_WEIGHT: 0.005
CONTRASTIVE_LOSS_WEIGHT: 0.05 WEIGHT_DECAY: 0.0001
but the class loss is still extremely high :loss_cls: 3.153e+05 how to fix

ilhoon23 commented 11 months ago

Hello, i encountered a same problem and solved the issue by downgrading the detectron2 from 0.4 to 0.3. I was able to solve the problem of encountering Inf/NaN at the very beginning but still had to encounter after few k iterations though. I guess there are many reasons for that error, but my case was this. Hope this helps.

ouyang11111 commented 10 months ago

my RTX3090 CUDA12.0 still can not success solve it ,then i rent a new card such as RTX3060 then run success