Closed Linda0111 closed 1 year ago
I tried to modify the parameters as shown in the figure and reduce the learning rate, except that the loss has a value when the train step=1, and the loss value is still nan later. I would be happy if you could give some advice
Yes, That's a little tricky.
The training is quite unstable at the beginning. When I was tuning the parameters, I tried different values of learningrate_init
and learningrate_end
until it got stable. You can also try some small learning rates to see if it learns.
The other option is to implement a warm-up schedule at the beginning of the training. I didn't add that in my scripts. You can give it a try.
I tried to modify the parameters as shown in the figure and reduce the learning rate, except that the loss has a value when the train step=1, and the loss value is still nan later. I would be happy if you could give some advice
Have you solved it? I have the same problem
This workflow (via Conda) did it for me:
cmd:
conda create -n tensorflow_23 python=3.8 conda activate tensorflow_23 conda install -c anaconda cudatoolkit=10.1.243 conda install -c anaconda cudnn= 7.6.5
pip install tensorflow==2.3 opencv-python==4.1.2.30 numpy==1.18.5 matplotlib==3.3.1 scikit-learn==0.23.2 tqdm==4.50.2 scikit-image==0.17.2
then run your train.py inside the conda venv
Have you solved this probelm? I met it these days. Can I get some help or advise from you? Thanks a lot !!
Hi,
Try to add a learning rate warm-up schedule at the beginning, or try small learning rate when training. Hope it helps.
Thanks,