Closed ouyang11111 closed 10 months ago
Hello, i encountered a same problem and solved the issue by downgrading the detectron2 from 0.4 to 0.3. I was able to solve the problem of encountering Inf/NaN at the very beginning but still had to encounter after few k iterations though. I guess there are many reasons for that error, but my case was this. Hope this helps.
my RTX3090 CUDA12.0 still can not success solve it ,then i rent a new card such as RTX3060 then run success
foggyspace code can training ,but when i run VOC2clipar my instructions is :CUDA_VISIBLE_DEVICES=0,1,2,3 python train_net.py --num-gpus 4 --config configs/faster_rcnn_R101_cross_clipart_b4.yaml OUTPUT_DIR output/exp_clipart_test many others solution is down the version of detectron2 ,but in my compute platform or CUDA are not support a lower version of detectron2. the train interupt in the beginning of a train (the first inter),with a very high loss any body meet same question or solve this problem ?
I have try this : down the loss_weigh such as BBOX_REG_LOSS_WEIGHT LOSS_WEIGHT: 0.01 BBOX_REG_LOSS_WEIGHT: 0.005
CONTRASTIVE_LOSS_WEIGHT: 0.05 WEIGHT_DECAY: 0.0001
but the class loss is still extremely high :loss_cls: 3.153e+05 how to fix