Open sfktrifork opened 5 years ago
I have the same problem here. @edvardHua Do you have any idea?
Indeed, the network architectures has a huge margin of improvement. We could follow the tips of paper Convolutional Neural Networks at Constrained Time Cost to optimize it.... But it takes time.
(Debugged using the iOS demo app at https://github.com/tucan9389/PoseEstimation-CoreML)
The output dimension of the pretrained CPM model when converted to CoreML is 96x96; however, the resolution of its prediction is ostensibly only 1/4th of that. All coordinates of predicted keypoint positions are multiples of 4, e.g. (4, 56), (24, 28), etc. This effectively means a prediction whose accuracy is 4 times worse than expected.
In debugging, I looked at the predicted position of the
top
point. Notice how the position is always a multiple of 4 for both x and y.Why does this occur? Does this have to do with the pretrained model itself? Should I train the network myself to yield a higher resolution?