chenyilun95 / tf-cpn

Cascaded Pyramid Network for Multi-Person Pose Estimation (CVPR 2018)
MIT License
793 stars 197 forks source link

about heatmap size #12

Closed akziq closed 6 years ago

akziq commented 6 years ago

Hi @chenyilun95,great work! how about generate heatmap size the same as original image (img :256x192 , heatmap: 256x192)? will it increase AP due to pixel to pixel match? Thanks.

chenyilun95 commented 6 years ago

First question is how to upsample to generate the final output heatmap?

  1. Bilinear upsampling will give more accurate gradient back-propagation for each pixel. But in testing, directly upsampling cannot produce the heatmap of higher resolution, which probably reduce the gain. Similar experiment is done in https://github.com/chenyilun95/tf-cpn/issues/4, which may show it doesn't work with better gradient in high resolution.
  2. Skip-connection with the lower feature maps, but their semantics aren't clear probably.
  3. Deconv: recent work (Simple Baseline for Human Pose Estimation) says it's fine with deconvolution layer. But they still upsample the output to 64x48. If that works, it might works as well in higher resolution output.

Nevertheless, that's only my viewpoints. Experiment results says louder !

akziq commented 6 years ago

I apologize for my ambiguous expression. my question is that the NET 's last layer output is 64x48,which is(W/4,H/4). how about change the last layer output to 256x192,which is (W,H). so that orig-img (W,H)->(W/2,H/2)->(W/4,H/4)->.....->(W/4,H/4)->(W/2,H/2)->(W,H),(pre-heatmap)

pixel to pixel match between orig-img and pre-heatmap will increase AP ?

Thanks for your response .

chenyilun95 commented 6 years ago

Excuse me... I'm now confused ... how do you change the last layer output to 256x192 ?

akziq commented 6 years ago

for exmaple 1,add some intermediate layer(W/2,H/2) by (Bilinear upsampling / Deconv/Skip-connection ) 2,and(Bilinear upsample / Deconv/Skip-connect) it to(W,H).

chenyilun95 commented 6 years ago

emmmm... then I think the above comments are my response... Generally, I tend to think it won't work considering efficiency and effectiveness.

akziq commented 6 years ago

@chenyilun95,Thank you,I get it. I note that most people make the last layer output to 64 64 (Hourglass Net etc.), 6448(yours). so the best practice of last layer output is (W/4,H/4)? Thanks for your response ,I will close this issue.