when I was training the net,the result show that:total_loss=nan box_loss=nan,conf_loss=nan,category_lass=nan could you help to solve this question?

ZhangAoCanada / RADDet

Range-Azimuth-Doppler Based Radar Object Detection

MIT License

168 stars 39 forks source link

when I was training the net,the result show that:total_loss=nan box_loss=nan,conf_loss=nan,category_lass=nan could you help to solve this question? #18

Closed Linda0111 closed 1 year ago

ZhangAoCanada commented 2 years ago

Hi,

Try to add a learning rate warm-up schedule at the beginning, or try small learning rate when training. Hope it helps.

Thanks,

Linda0111 commented 2 years ago

I tried to modify the parameters as shown in the figure and reduce the learning rate, except that the loss has a value when the train step=1, and the loss value is still nan later. I would be happy if you could give some advice

ZhangAoCanada commented 2 years ago

Yes, That's a little tricky.

The training is quite unstable at the beginning. When I was tuning the parameters, I tried different values of learningrate_init and learningrate_end until it got stable. You can also try some small learning rates to see if it learns.

The other option is to implement a warm-up schedule at the beginning of the training. I didn't add that in my scripts. You can give it a try.

YDMYy commented 12 months ago

I tried to modify the parameters as shown in the figure and reduce the learning rate, except that the loss has a value when the train step=1, and the loss value is still nan later. I would be happy if you could give some advice

Have you solved it? I have the same problem

Gabo181 commented 10 months ago

This workflow (via Conda) did it for me:

cmd:

conda create -n tensorflow_23 python=3.8 conda activate tensorflow_23 conda install -c anaconda cudatoolkit=10.1.243 conda install -c anaconda cudnn= 7.6.5

pip install tensorflow==2.3 opencv-python==4.1.2.30 numpy==1.18.5 matplotlib==3.3.1 scikit-learn==0.23.2 tqdm==4.50.2 scikit-image==0.17.2

then run your train.py inside the conda venv

huiwenXie commented 5 months ago

Have you solved this probelm? I met it these days. Can I get some help or advise from you? Thanks a lot !!