edvardHua / PoseEstimationForMobile

:dancer: Real-time single person pose estimation for Android and iOS.
Apache License 2.0
1k stars 268 forks source link

Pretrained CPM model output resolution is only 24x24 #61

Open sfktrifork opened 5 years ago

sfktrifork commented 5 years ago

(Debugged using the iOS demo app at https://github.com/tucan9389/PoseEstimation-CoreML)

The output dimension of the pretrained CPM model when converted to CoreML is 96x96; however, the resolution of its prediction is ostensibly only 1/4th of that. All coordinates of predicted keypoint positions are multiples of 4, e.g. (4, 56), (24, 28), etc. This effectively means a prediction whose accuracy is 4 times worse than expected.

In debugging, I looked at the predicted position of the top point. Notice how the position is always a multiple of 4 for both x and y.

Max top point is: (44,24) with confidence 0.68310546875
w/h: 96/96
Max top point is: (44,24) with confidence 0.6572265625
w/h: 96/96
Max top point is: (44,24) with confidence 0.677734375
w/h: 96/96
Max top point is: (44,24) with confidence 0.72900390625
w/h: 96/96
Max top point is: (48,20) with confidence 0.13720703125
w/h: 96/96
Max top point is: (84,84) with confidence 0.021026611328125

Why does this occur? Does this have to do with the pretrained model itself? Should I train the network myself to yield a higher resolution?

tucan9389 commented 5 years ago

I have the same problem here. @edvardHua Do you have any idea? img_1254

edvardHua commented 5 years ago

Indeed, the network architectures has a huge margin of improvement. We could follow the tips of paper Convolutional Neural Networks at Constrained Time Cost to optimize it.... But it takes time.