Open Nestarneal opened 6 years ago
Hi, I am not sure, but have a guess about your problem, it could be a bug in PAF calculation.
I've made my own python augmentation code(btw it 5 times faster than C++ version and has significantly less code), and stuck into the bug like yours, loss became NaN.
I've found and fixed this bug by following code: https://github.com/anatolix/keras_Realtime_Multi-Person_Pose_Estimation/commit/9e5adc4d4af64b642562882cedf9e30cbf00ed05 The cause of the bug was what sometimes limb vector has zero length(body parts for start PAF and end PAF in same place, i.e. hand is perpendicular to image plane, for example directed to the camera) it produces NaN which kill whole neural network instantly.
After it I noticed original code probably has same bug too: https://github.com/CMU-Perceptual-Computing-Lab/caffe_train/blob/master/src/caffe/cpm_data_transformer.cpp
float norm_bc = sqrt(bc.x*bc.x + bc.y*bc.y);
bc.x = bc.x /norm_bc;
bc.y = bc.y /norm_bc;
The solution for you problem - just check network input doesn't contain NaNs, if it does remove this picture from training.
I got the same issue, the loss suddenlly goes to NAN at some iter. Did you guys solved this problem by fixing this PAF bug?
Hi, I tried to re-produce the result, but I saw the training loss go into NAN after iteration 15.
Following steps is what I've done:
Is there any steps going wrong or I should adjust the parameters to avoid going into NAN? Many thanks.