Gengzigang / PCT

This is an official implementation of our CVPR 2023 paper "Human Pose as Compositional Tokens" (https://arxiv.org/pdf/2303.11638.pdf)
MIT License
311 stars 20 forks source link

About Ablation #5

Closed imabackstabber closed 1 year ago

imabackstabber commented 1 year ago

Hi, I've been reading your paper, good work! However, there are 2 terms i don't quite understand. In 4.5 section(ablation study), you mentioned Image Guidance and auxiliary Pose Reconstruction Loss, I don't know what they refer to since I'm new to this field. Could you explain?

Gengzigang commented 1 year ago

These two components are designed to improve the performance of the method. Regarding the image guidance, section 3.1 of our paper explains it. During the tokenizer training stage, we concatenate the image features around the joints with the positional features and pass it through the encoder to enhance the tokenizer's discrimination ability. In the code, the parameter "model.keypoint_head.tokenizer.guide_ratio" controls whether it is turned on (0 means not used, 0.5 means it is used). As for the pose reconstruction loss, section 3.2 of our paper in the training part describes it. During the classifier training stage, we introduce a pose reconstruction loss to improve the performance of classification. In the code, you can turn it off by setting "model.keypoint_head.loss_keypoint.joint_loss" to 0.0.

imabackstabber commented 1 year ago

Now it's clear to me. Thanks a lot.