Nan loss while training face detection model on custom dataset

aiswaryasukumar4 commented 3 years ago

First of all, thank you for this amazing work. But the training procedure and requirements of face detection lack some clarity. I will list out the errors I faced during the training.

I have tried to do transfer learning using the pretrained v1 model, but it gave a nan loss.
Mxnet used: mxnet-cu100==1.5.0
As the training with the pretrained model failed, I decided to train from scratch.
I have verified that the data do not contain negative bounding boxes.
After completing around 1 lakh iterations, RuntimeWarning: invalid value encountered in multiply loss_score = numpy.sum(pred_score * mask_score) this error was printed and it started producing nan loss again.
During the training, both the losses had values more than 1000.

@YonghaoHe Any idea on this issue? Can anyone list out the correct procedure to follow while training on the custom dataset?

YonghaoHe commented 3 years ago

@aiswaryasukumar4 Thank you for interesting in our method. If you use the pre-trained model, you have to start with a small learning rate, say 0.01 or 0.001. The errors you encounter may be caused by invalid data values. You can prepare your own data by using our code. You can carefully read the code and find the mistakes. By the way, I will release a new repo ---- LFD, the successor of LFFD, it is much better and implemented with PyTorch (which is most popular now).

aiswaryasukumar4 commented 3 years ago

@aiswaryasukumar4 Thank you for interesting in our method. If you use the pre-trained model, you have to start with a small learning rate, say 0.01 or 0.001. The errors you encounter may be caused by invalid data values. You can prepare your own data by using our code. You can carefully read the code and find the mistakes. By the way, I will release a new repo ---- LFD, the successor of LFFD, it is much better and implemented with PyTorch (which is most popular now).

Thank you @YonghaoHe for the response. Setting the learning rate to a smaller value solved the nan loss issue while using the pretrained model.

YonghaoHe / LFFD-A-Light-and-Fast-Face-Detector-for-Edge-Devices

Nan loss while training face detection model on custom dataset #103