bmartacho / UniPose

We propose UniPose, a unified framework for human pose estimation, based on our “Waterfall” Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. Current pose estimation methods utilizing standard CNN architectures heavily rely on statistical postprocessing or predefined anchor poses for joint localization. UniPose incorporates contextual seg- mentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filter- ing in the cascade architecture, while maintaining multi- scale fields-of-view comparable to spatial pyramid config- urations. Additionally, our method is extended to UniPose- LSTM for multi-frame processing and achieves state-of-the- art results for temporal pose estimation in Video. Our re- sults on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of- the-art results in single person pose detection for both sin- gle images and videos.
Other
211 stars 44 forks source link

Person detection branch? #2

Closed minhhoangbui closed 3 years ago

minhhoangbui commented 3 years ago

Dear, Thank you a lot for your work. And I have some questions to ask: 1) I don't see you train the person detection branch. How come you can predict bbox and pose at the same time as you claim? 2) In MPII, you use segmented folder. How can you find that annotation? I'm looking forward to hearing from you

bmartacho commented 3 years ago

Dear minhhoangbui,

Answering your questions:

1) The Bounding Box detection is obtained by the adding heatmap layer for center and corner coordinates of the bounding box in the same head of the of the pose estimation.

2) Since UniPose is intended for single person pose estimation, and the MPII dataset is a multi-person dataset, we initially identified the target individual for the detection using the center location provided. We also performed segmentation with our WASP method: https://www.mdpi.com/1424-8220/19/24/5361, but those results aren't included in this paper.

minhhoangbui commented 3 years ago

@bmartacho Thank you a lot for this answer. I'm a bit curious. Is it necessary to predict background heatmap. According to my understanding about top-down pose architecture, not many provide this. Furthermore, I think top-left point and bottom-right is enough to define a bounding box. Why you need 5 points in this case ? Another thing, the way you name variable limbs in heat, limbs = self.model(input_var) may be a little misleading.

YHDang commented 3 years ago

Hello,thanks for your discussion. I have a question about the evaluation of the Penn Action dataset. The paper said the evaluation follows [27], i.e. LSTM Pose Machines, so the torso is represented by the size of the bbox, right? But in the file evaluate.py, the torso was calculated by the l2 norm of the difference between the neck and the pelvis. I'm a little confused about it. Could you give me some suggestions, please? @minhhoangbui @bmartacho Thanks very much!