HRNet / HigherHRNet-Human-Pose-Estimation

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)
MIT License
1.32k stars 272 forks source link

the outcome seems not good #17

Open YeHaijia opened 4 years ago

YeHaijia commented 4 years ago

I use pose_higher_hrnet_w32_512.pth to test some my own images.But the outcome seem not good?what's wrong? image image image

Daniil-Osokin commented 4 years ago

Hi, I believe this is due to drawing all found keypoints, even with low score. Try to draw keypoints, in which network more or less confident, e.g. which have >= 0.1 value in heatmap.

P.S. This mode is useful for results submission, because COCO dataset expects to have 17 keypoints for any person, even if it truncated as in the 3rd image. And some keypoints with low confidence may be not so far from the target, so still positively contribute to accuracy metric, thus improving final score.

plmsmile commented 4 years ago

Hi, I believe this is due to drawing all found keypoints, even with low score. Try to draw keypoints, in which network more or less confident, e.g. which have >= 0.1 value in heatmap.

P.S. This mode is useful for results submission, because COCO dataset expects to have 17 keypoints for any person, even if it truncated as in the 3rd image. And some keypoints with low confidence may be not so far from the target, so still positively contribute to accuracy metric, thus improving final score.

I met the same problem and the result looks very bad.

image image image

Daniil-Osokin commented 4 years ago

I cannot find the threshold for keypoints in config file. Actually, keypoint confidence is the 3rd value in final_results (in 3rd dim). Suppose you use save_valid_image for visualization, so here is the diff:

diff --git a/lib/utils/vis.py b/lib/utils/vis.py
index f351a0a..f4b169e 100755
--- a/lib/utils/vis.py
+++ b/lib/utils/vis.py
@@ -20,12 +20,13 @@ from dataset import VIS_CONFIG
 def add_joints(image, joints, color, dataset='COCO'):
     part_idx = VIS_CONFIG[dataset]['part_idx']
     part_orders = VIS_CONFIG[dataset]['part_orders']
+    kpt_threshold = 0.1

     def link(a, b, color):
         if part_idx[a] < joints.shape[0] and part_idx[b] < joints.shape[0]:
             jointa = joints[part_idx[a]]
             jointb = joints[part_idx[b]]
-            if jointa[2] > 0 and jointb[2] > 0:
+            if jointa[2] > kpt_threshold and jointb[2] > kpt_threshold:
                 cv2.line(
                     image,
                     (int(jointa[0]), int(jointa[1])),
@@ -36,7 +37,7 @@ def add_joints(image, joints, color, dataset='COCO'):

     # add joints
     for joint in joints:
-        if joint[2] > 0:
+        if joint[2] > kpt_threshold:
             cv2.circle(image, (int(joint[0]), int(joint[1])), 1, color, 2)

     # add link

And result: result_valid_11

leoxiaobin commented 4 years ago

As @Daniil-Osokin said, you should add a threshold to filter the low score points.

hothothot2000 commented 4 years ago

The strange links mentioned in my post represents very long lines linked to joints out of image. I have tried kpt_threshold, this solved some of the strange links. But, a few strange links still appear even I set kpt_threshold=0.4 . And what's more, single person sometimes was drawed links in several colors.

I think that means this person has been counted repeatedly as different persons. As a proof of my guess, sometimes there are 80+ persons detected in final_results of frame of my video with only 3 ~ 7 persons inside.

Maybe we need set a person_threshold? And Where to set this will make sense?

Daniil-Osokin commented 4 years ago

You can play with parameters for your specific use case. Addition of person_threshold is an option, e.g. you can sum all keypoints confidence (joint[2] in the code above) and draw a person if this sum is higher than some threshold. Or you can count number of keypoints with high confidence and draw a pose if it has significant number of such keypoints (e.g.more than a half of total keypoints number).

hothothot2000 commented 4 years ago

You can play with parameters for your specific use case. Addition of person_threshold is an option, e.g. you can sum all keypoints confidence (joint[2] in the code above) and draw a person if this sum is higher than some threshold. Or you can count number of keypoints with high confidence and draw a pose if it has significant number of such keypoints (e.g.more than a half of total keypoints number).

Thanks for your advice, I think your method will probably work. But, this is only one section of the total problem.

The other section is -- the detected 80+ persons and their joints in the frame with actual 7 persons seems have been over calculated with too much time consumption. The "Inf_time" of these frames is really about many times or 10 more than that of frames not over calculated. Although from the viewpoint of a video viewer, there is only small difference between the 2 kinds of frames: same several persons walked by short distance in the exactly same background.

May there be any way to save these time on over calculation?

Daniil-Osokin commented 4 years ago

The tricks above can deal with this. You just need to apply them at the time of poses construction, not at the time of result visualization. Just set higher threshold to keypoints values in heatmaps before grouping into the poses, and the time will be spend only on poses, in which network is strongly confident.

hothothot2000 commented 4 years ago

image

image

image

The tricks above can deal with this. You just need to apply them at the time of poses construction, not at the time of result visualization. Just set higher threshold to keypoints values in heatmaps before grouping into the poses, and the time will be spend only on poses, in which network is strongly confident.

Thanks a lot. This trick did help to remove majority of the unreasonable locations of joints (drawed in place of no person or even place of pure black area, mostly near the corner or border) when we set kpt_threshold=0.3 before grouping. But, minority still stayed with confidance >0.3, a few even stayed with confidance >0.4. And a collateral negative effect is, some resonable joints also had been discarded if kpt_threshold was set to 0.4 or higher.

Above from that, we occasionally found some joints-pair links are incomprehensible. For example, A's knee was linked to B's ankle when they are not adjacent, or even apart by a considerably far distance. thus we are really confused by these mismatched links.

May there be more hint for these please?

Daniil-Osokin commented 4 years ago

A's knee was linked to B's ankle

This behavior seems buggy for me (but possibly this is a feature to maximize leaderboard score). In general, parts of different persons should not be linked together. To get rid of far links you may try to estimate mean size of links between keypoints of the same person and filter out outliers (ones, which longer than twice mean size). @hothothot2000 BTW, this is just a repository of research work with own pros and cons. To apply its results in production solution, consider using consulting specialists. It can really save the time and give necessary quality level.

YinXiaojieCSDN commented 4 years ago

I use pose_higher_hrnet_w32_512.pth to test some my own images.But the outcome seem not good?what's wrong? image image image

can you give me yours enviroment list(or pip list)?,I be stracked when i valid.

Daniil-Osokin commented 4 years ago

Will requirements.txt work for you?

joaqo commented 4 years ago

Hi @Daniil-Osokin, do you mind posting the diff of the code you are using to run inference on a custom image and draw/filter the keypoints?

Daniil-Osokin commented 4 years ago

Hi, I've run valid.py on COCO validation data.

joaqo commented 4 years ago

Thanks, tried it on a simple 6-person image on got perfect results, but then I tried it on a video, in a much more complicated scene, and these are the results I'm getting:

image

I used the following command to run it: python tools/valid.py --cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml TEST.MODEL_FILE models/pytorch/pose_coco/pose_higher_hrnet_w32_512.pth TEST.FLIP_TEST False

So I can validate that there is some issue with the pose decoding on hard examples.

Daniil-Osokin commented 4 years ago

Actually there are some nice poses in the output. To get rid of mispredicted results a post-processing (which discussed above) is required.

joaqo commented 4 years ago

Gotcha, I tried filtering according to the diff you posted with regards to vis.py, but using a 0.3 threshold, and the results were less noisy, but not ideal.

image

I understand that filtering at the heatmap tensor level will probably filter the erroneous association between different people, still most of the people in the image aren't getting detected at all. Am I using a model version which is too small? Maybe a model trained on CrowdPose would better fit this particular sample?

Thanks for the help, and for open sourcing the model!

Daniil-Osokin commented 4 years ago

I would say the first thing for this image to get better results is to increase network input resolution (possibly set scale 2 (or even 4) here).

Thanks, however this work is not mine, I'm just answering on some questions. I've got the result below (just ran on image above) with other method, so this should also work. lightweight_openpose

joaqo commented 3 years ago

Hi @Daniil-Osokin, one question, the repo you posted reported 40mAP, which I think would be much worse accuracy wise than the image you posted which seems to be really good. Is it possible you actually used one of the newer neighbour projects from that group? I would try them but they seem to require OpenVINO. Thanks!

Daniil-Osokin commented 3 years ago

You can reproduce this result using --height-size=1024.

DearPerpetual commented 1 year ago

You can reproduce this result using --height-size=1024.

What is the reason why I used this network to train my dataset with only 5 key points, and my training phase was normal, but the results during testing were poor and I couldn't look directly at them?