HowieMa / NSRMhand

[WACV 2020] "Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation"
104 stars 16 forks source link

accuracy in Panoptic #2

Closed www516717402 closed 4 years ago

www516717402 commented 4 years ago

Thanks for pretty project. I meet a question about PCK accuracy. Use your project in Panoptic data,In train step ”Current Best EPOCH is : 32, PCK is : 0.9577505546445453“,but In Test step appear
"0.04": 0.5353691038114515, "0.06": 0.5957357993770676, "0.08": 0.6186944096586716, "0.1": 0.6288090421603569, "0.12": 0.6367080884950065 Test some picture,the result very difficult. hope your answer,Thanks.

HowieMa commented 4 years ago

In training process, the CURRENT Best EPOCH is based on the PCK@0.1 * BBOX of validation dataset. And the inference code and PCK calculating code for both validation dataset and test dataset are same. Your log seems a little strange, cause in my experiments the validation PCK and test PCK are quite close. I am not sure how you crop the Panoptic dataset and how your split your training/dev/test, now I release the preprocessed data in my experiment, you can download it from here to check whether it solves your problems. But please DO NOT duplicate it for any commercial purposes, and the copyright still belongs to Panoptic.

www516717402 commented 4 years ago

Thank you for reply. Test your project in OneHand10k, acquire the same accuracy with paper.But, Severe vibration of keypoint and lower with confidence in Actual video 1280*720 shape. I already modify your model by hourglass and add center cycle detect and other trick.Get a more stable model.About Limb mask trick, thoughts like this paper Multi-Scale Structure-Aware Network for Human Pose Estimation

HowieMa commented 4 years ago

Since our model is trained only on image with size 368 * 368, it may not work very well on high resolution image. Thank you for adapting our model with Hourglass, and wish you can share your results. Besides, thank you for sharing this paper, and I will read it recently. Actually, our limb mask idea is original from the Part Affinity Field of Openpose PAF. The idea of limb representation is very common in pose estimation problem, and there are lot of papers talking about it.

www516717402 commented 4 years ago

I am glad to share my code. Currently, we are annotation data to train our model and modify part of finger detect.After publishing paper, I will discuss with the mentor to open source.Thank you for share project again.

HowieMa commented 4 years ago

I am glad to share my code. Currently, we are annotation data to train our model and modify part of finger detect.After publishing paper, I will discuss with the mentor to open source.Thank you for share project again.

Look forward to your paper and code. And I really appreciate it if you could cite my paper. Thank you!

www516717402 commented 4 years ago

Of course, this project helped me a lot.

aqsc commented 4 years ago

It does the padding the hand with 2.2B size of Panoptic data in training, but it only gets good results in panoptic dataset while it gets worse results when testing other imageset, such as onehand10k or self-photoing pictures. I think the padding size impacts greatly. Another question, If we train the hand keypoint with the merge dataset of Panoptic and onehand10k or other hans with different padding size, can we get the better results when testing the hand with different padding size ?

HowieMa commented 4 years ago

It does the padding the hand with 2.2B size of Panoptic data in training, but it only gets good results in panoptic dataset while it gets worse results when testing other imageset, such as onehand10k or self-photoing pictures. I think the padding size impacts greatly. Another question, If we train the hand keypoint with the merge dataset of Panoptic and onehand10k or other hans with different padding size, can we get the better results when testing the hand with different padding size ?

Yes, you are correct. The padding size may impact a lot. As the model I released is just trained on the preprocessed Panoptic dataset. Thus it may only work well on the fixed 2.2B bounding box. Besides, the Panoptic(P) dataset has a totally different distribution with the Onehand10K(O), like the background of P is just the lab, while the background of O is the wild. Thus I think it's unfair to test the model trained with P on O.

For the second question, it may work well, as the hands in O can take any percent area of the image. For this, you may need to adjust the hyperparameters sigma of LPM and width of LDM, to make them consistent with the hand size.

By the way, the goal of this paper is just to improve performance with algorithm, not to build a general hand pose estimation system working on all scenes :)

aqsc commented 4 years ago

We can see that invisible keypoints are unannotated in dataset O, as the values =-1 in the label. If we want to train O, should we modify the HandDataset_LPM class in the hand_lpm.py ? and are keypoint groudtruth values still written as -1 in the labels.json?

HowieMa commented 4 years ago

We can see that invisible keypoints are unannotated in dataset O, as the values =-1 in the label. If we want to train O, should we modify the HandDataset_LPM class in the hand_lpm.py ? and are keypoint groudtruth values still written as -1 in the labels.json?

For this issue, you can make the heatmap all zeros in training. When evaluating the PCK, if label -1, you can just ignore it. Thus you may need to modify the data loader function.

HowieMa commented 4 years ago

Can you share us the data loader function or other modified function when training O dataset?

There is no tricks in this function, and I already said very clear that you just need to set all the heatmaps zero for invisible keypoints. That is really simple to code it by yourself, just two lines code ... I believe you can do it within a few seconds :)

For the code in data loader, just

    def gen_label_heatmap(self, label):
        label = torch.Tensor(label)     # (21,2)
        grid = torch.zeros((self.label_size, self.label_size, 2))       # size:(46,46,2)
        grid[..., 0] = torch.Tensor(range(self.label_size)).unsqueeze(0)
        grid[..., 1] = torch.Tensor(range(self.label_size)).unsqueeze(1)
        grid = grid.unsqueeze(0)
        labels = label.unsqueeze(-2).unsqueeze(-2)
        exponent = torch.sum((grid - labels)**2, dim=-1)    # size:(21,46,46)
        heatmaps = torch.exp(-exponent / 2.0 / self.sigma / self.sigma)  # size:(21,46,46)

        # Here is the only different  *******************************
        invisible = (label[:, 0] == -1)     # set invisible heat maps to zero
        heatmaps[invisible, ...] = 0
        # **********************************************************
        return heatmaps

For the sigma in the LPM, as the sizes of hands vary in the Onehand10K, thus I set it as 0.03 of the bounding box size in the input image scale (368 * 368). You can adjust it by your self to get a better results. I just set it casually ... :)

HowieMa commented 4 years ago

Can you share us the data loader function or other modified function when training O dataset?

By the way, its better to start a new issue for this if you still have questions, rather than discussing a lot in other people's issue which is not relevant to your question. I found that the owner of this issue close it just now. I hope our discussion does not bother him or her :)