Closed Xiangyu-CAS closed 6 years ago
Your question is GREAT, I am thinking about this question during these days, too! The reason why I initially chose regression method is that inference speed is what I really care about most. Personally, I guess one-hot mask (e.g. Mask R-CNN) and heatmap (e.g. Stacked Hourglass Network) may guarantee better mAP but probably slower than direct regression ( Just a guess, have not verified by myself ). I have a plan to do some related experiments to verify these three choices and figure out which method is the best choice to balance accuracy-speed tradeoff. What do you think of these three different choices? I'd like to know your opinions on this problem. ( I notice you have reimplemented several popular pose estimator. )
I would appreciate it if you could share some progress on this problem after experiments :) .
As to me, I did not carry out much experiments, so I can only guess based on conference papers. A paper [1] saying heatmap is better than regression and I have noticed nearly all the papers chose to use heatmap after this idea was first proposed in ECCV 2014. So maybe it is true that heatmap is better than regression. However, it really hard to tell which is better between heatmap and one-shot mask, because both of them achieved decent performance on MSCOCO dataset. In my view, I think one-shot mask is promising. Softmax loss seems perform better in classification. But it may have more difficulties in training. For example, a network predict a keypoint at position (1, 1) and GT is actually (0, 1), If I choose heatmap and regression, loss is small but for one-shot mask loss is large ( and it is larger than blank prediction).
BTW, My classmate used deeplab_v3 as baseline to predict keypoints on FashionAI challenge and got 13% without any augmentation. My heatmap baseline got 12% with augmentation.
Please don't hesitate to point out my mistakes : )
[1] Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification ; CVPR 2018
There is one paper [1] which compares regression (R1), heatmap (H1), the one-hot mask (H2, H3) on 3D pose estimation track. Performance changes slightly according to a different task or metric and there is no speed discussion, I will do some related experiments and report final results to you when finished. [1] Integral Human Pose Regression
Same question here! Thanks for the discussion!
Hi~ Recently, I have read a lot papers related to keypoint localization and found out some of them could be categorized into three kinds based on loss ( regerssion directly, dense predicted by heatmap, and one-shot semantic segmentation). I am curious about which way is better.
It seems your work aims to regress 16 point directly. is this possible to get better performance by predicting heatmap densely?