gathierry / FashionAI-KeyPointsDetectionOfApparel

FashionAI Key Points Detection using CPN model in Pytorch
Apache License 2.0
189 stars 61 forks source link

Questions on the keypoint encoder #2

Closed alwc closed 6 years ago

alwc commented 6 years ago

Hi Shiyu, great repo! I have two questions for you:

  1. In config.py, what's your rationale behind using 16 for self.hm_sigma = self.img_max_size / self.hm_stride / 16.? How do you pick that number?

  2. In stage2/keypoint_encoder.py, why do you need to reduce the input image size when calculating the Gaussian key point? In addition, why do you have to subtract 1.0 in kpts[:,:2] = (kpts[:,:2] - 1.0) / stride?

Thanks!

gathierry commented 6 years ago

@alwc Thanks.

  1. If sigma is large, it will undermine the accuracy, obviously. And if sigma is too small, I found that the model cannot distinguish left and right. For example, both left and right armpits are highlighted on the heat map representing left armpit. So I tried several sigma to find a proper one.

  2. I was reducing the target size because the output of CPN is 4x downsampled. For the '-1', it's a little bit tricky here. I'm not very sure about it. I guess the origin of the coordinates given might start from 1 instead of 0. But maybe I am wrong. It's not a very big deal anyway.

alwc commented 6 years ago

Thanks for the reply!

  1. That makes sense to me.

  2. I see. Is it possible you are getting a boost from np.rint(x+2), np.rint(y+2) in decode_np because you are subtracting extra 1 in encode?

alwc commented 6 years ago

Shiyu大神, I have a few follow up questions.

  1. When I run bash stage2/autorun.sh with Python3, I'm getting
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
  1. In your stage2/trainval.py, you are using CascadePyramidNetV8. I assume your README.md (i.e. use CascadePyramidNet and CascadePyramidNetV9) is more up-to-date right?

  2. For your compute_l1_weighted_loss in stage2/viserrloss, I walked through the code and I understand most of the ideas behind. I notice that the paper Cascaded Pyramid Network for Multi-Person Pose Estimation uses L2 loss with OHKM (online hard keypoints mining) and you used L1 loss with a different flavor of OHKM. I'm curious what's the rationale behind using L1 loss? Also how did you come up with your OHKM implementations (e.g. setting threshold with amplitude /10)?

Sorry for asking so many questions. I'm new to keypoints detection and I've learned a lot so far from your implementations!

gathierry commented 6 years ago

@alwc ,大神不敢当,我的代码没整理,所以可能会给你在造成很多困惑,先道个歉。

I see. Is it possible you are getting a boost from np.rint(x+2), np.rint(y+2) in decode_np because you are subtracting extra 1 in encode?

It's not very likely, since when I also tried to encode without -1 but keeping +2, the result was almost the same. But if +2 is removed, the accuracy will decrease.

When I run bash stage2/autorun.sh with Python3, I'm getting ...

Maybe change the trainval.py to an absolute path. But you will find other error since I hard coded the path in my environment. Sorry for the inconvenient, I'll try to improve this later.

In your stage2/trainval.py, you are using CascadePyramidNetV8. I assume your README.md (i.e. use CascadePyramidNet and CascadePyramidNetV9) is more up-to-date right?

Still, sorry for the confusion. In fact, different version means different designs. After experiments, I found that CascadePyramidNet and CascadePyramidNetV9 gave me the best result. For now, you can forget all the other versions.

For your compute_l1_weighted_loss in stage2/viserrloss ...

I tested both L1 and L2. L1 was slightly better than L2 and I am not very sure about the reason.

I used amplitude / 10 to distinguish the positive part and negative part. For example, on a heat map, only a small range is highlighted (positive part) and all the other part is nearly zero (negative part). If the prediction is all zero, the loss may be still acceptable. So I want to give the positive part a higher weight while give the negative part a small one. That's why there's 0.5*loss(pos)+0.5*loss(neg)

For the OHKM, I think I used the same method as the paper, choosing half key points that causing higher loss. Correct me if I am wrong.

alwc commented 6 years ago

題外話:我想請問一般天池用戶去那裡討論的?官方的"技術圈"基本上是沒甚麼人(難得看到你的分享!)。

gathierry commented 6 years ago

I don't know either. But it's good to discuss with you. I am looking forward to your notes as well. For now, I'm gonna close this issue. But it can be reopen at any time.