Closed alwc closed 6 years ago
@alwc Thanks.
If sigma is large, it will undermine the accuracy, obviously. And if sigma is too small, I found that the model cannot distinguish left and right. For example, both left and right armpits are highlighted on the heat map representing left armpit. So I tried several sigma to find a proper one.
I was reducing the target size because the output of CPN is 4x downsampled. For the '-1', it's a little bit tricky here. I'm not very sure about it. I guess the origin of the coordinates given might start from 1 instead of 0. But maybe I am wrong. It's not a very big deal anyway.
Thanks for the reply!
That makes sense to me.
I see. Is it possible you are getting a boost from np.rint(x+2), np.rint(y+2)
in decode_np
because you are subtracting extra 1
in encode
?
Shiyu大神, I have a few follow up questions.
bash stage2/autorun.sh
with Python3, I'm getting python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
python3: can't open file 'trainval.py': [Errno 2] No such file or directory
In your stage2/trainval.py
, you are using CascadePyramidNetV8
. I assume your README.md
(i.e. use CascadePyramidNet
and CascadePyramidNetV9
) is more up-to-date right?
For your compute_l1_weighted_loss
in stage2/viserrloss
, I walked through the code and I understand most of the ideas behind. I notice that the paper Cascaded Pyramid Network for Multi-Person Pose Estimation uses L2 loss with OHKM (online hard keypoints mining) and you used L1 loss with a different flavor of OHKM. I'm curious what's the rationale behind using L1 loss? Also how did you come up with your OHKM implementations (e.g. setting threshold with amplitude /10
)?
Sorry for asking so many questions. I'm new to keypoints detection and I've learned a lot so far from your implementations!
@alwc ,大神不敢当,我的代码没整理,所以可能会给你在造成很多困惑,先道个歉。
I see. Is it possible you are getting a boost from np.rint(x+2), np.rint(y+2) in decode_np because you are subtracting extra 1 in encode?
It's not very likely, since when I also tried to encode without -1 but keeping +2, the result was almost the same. But if +2 is removed, the accuracy will decrease.
When I run bash stage2/autorun.sh with Python3, I'm getting ...
Maybe change the trainval.py
to an absolute path. But you will find other error since I hard coded the path in my environment. Sorry for the inconvenient, I'll try to improve this later.
In your stage2/trainval.py, you are using CascadePyramidNetV8. I assume your README.md (i.e. use CascadePyramidNet and CascadePyramidNetV9) is more up-to-date right?
Still, sorry for the confusion. In fact, different version means different designs. After experiments, I found that CascadePyramidNet and CascadePyramidNetV9 gave me the best result. For now, you can forget all the other versions.
For your compute_l1_weighted_loss in stage2/viserrloss ...
I tested both L1 and L2. L1 was slightly better than L2 and I am not very sure about the reason.
I used amplitude / 10
to distinguish the positive part and negative part. For example, on a heat map, only a small range is highlighted (positive part) and all the other part is nearly zero (negative part). If the prediction is all zero, the loss may be still acceptable. So I want to give the positive part a higher weight while give the negative part a small one. That's why there's 0.5*loss(pos)+0.5*loss(neg)
For the OHKM, I think I used the same method as the paper, choosing half key points that causing higher loss. Correct me if I am wrong.
No sorry! I've actually found your code to be quite readable. I've written down a lot of extra documentations while learning your code. I'll share with you once I'm done.
Regarding OHKM, I've just briefly glanced the paper, let me take a deeper look of it again.
題外話:我想請問一般天池用戶去那裡討論的?官方的"技術圈"基本上是沒甚麼人(難得看到你的分享!)。
I don't know either. But it's good to discuss with you. I am looking forward to your notes as well. For now, I'm gonna close this issue. But it can be reopen at any time.
Hi Shiyu, great repo! I have two questions for you:
In
config.py
, what's your rationale behind using16
forself.hm_sigma = self.img_max_size / self.hm_stride / 16.
? How do you pick that number?In
stage2/keypoint_encoder.py
, why do you need to reduce the input image size when calculating the Gaussian key point? In addition, why do you have to subtract1.0
inkpts[:,:2] = (kpts[:,:2] - 1.0) / stride
?Thanks!