Closed interactivetech closed 6 years ago
I studied the loss during training, and from 4-9 steps, the loss increased rapidly to a large number until reaching nan. I had a feeling this was due to a bad learning rate, so I investigated what was the learning rate other repos set to train FCN_VGG16-32s. It seems that people use 1e-10
.
Now the learning rate is behaving appropriately.
Here are some links of additional details where people set the learning rate to 1e-10:
Update, 1e-10 as a learning rate was not working as well as I hoped it would. I am getting the best training results setting 'base_lr=1e-5' and weight_decay='1e-5'
Hi there.
Nice job on the repo! I am trying to train the FCN_Vgg16_32s model, and I am having issue with the loss being "nan".
I followed the steps to set up the VOC dataset and transfer VGG weights for the FCN model.
The only code changes I made was in train.py was setting model_name='FCN_Vgg16_32s' and the loss after a few steps, results to nan.
Here is a screenshot of my terminal output.
Please advise on how I should go about debugging the loss function, and if you need additional information.