daodaofr / AlignPS

Code for CVPR 2021 paper: Anchor-Free Person Search
Apache License 2.0
167 stars 34 forks source link

scale alignment #30

Closed FeboReigns closed 2 years ago

FeboReigns commented 2 years ago

1.我理解的对吗?

1.Am I right to understand that?

假如同一个人,他的大图和小图分别对应不同层次的特征图。不同的层次特征图肯定不完全一样。同一个人不同大小的图片的特征竟然在不同的特征图上不一样,很可能认为为不是一个人。如果query是个小图,是高层次的特征,gallery是大图,使用低层次的特,由于两个特征来自不同尺度的特征图,因此有可能导致特征向量点乘不正常,从而造成误判。所以你只用了一层特征图。

If it is the same person, his large picture and small picture correspond to different levels of feature maps. Different hierarchical feature maps are certainly not exactly the same. The characteristics of different size pictures of the same person are different on different feature maps. Probably think not as a person. If the query is a small graph, which is a high-level feature, and the gallery is a large graph, using a low-level feature, because the two features come from a feature map of different scales, it is possible to cause the feature vector point multiplication is not normal, resulting in misjudgment. So you only use one layer of feature maps.

2.有没有好的方法,来解决因为两个特征来自不同特征图,而导致误判的问题呢,这样就不用删掉其他层次的FPN了,这样的话就能进一步提高了。 很期待你的进一步科研成果。

2.Is there a good way to solve the problem of misjudgment because the two features come from different feature maps, so that there is no need to delete other levels of FPN, so that it can be further improved.

I look forward to your further research results

daodaofr commented 2 years ago

Yes, your understanding is right. On CUHK-SYSU and PRW, the pedestrian detection is relatively a simple task, thus the scale misalignment issue mainly impacts the re-id task. On a more challenging benchmark, e.g., MovieNet-CS (https://github.com/ZhengPeng7/GLCNet), I think it will impact both the detection and re-id tasks. I think the design of PANet and BiFPN could partially address this issue, which contains both a top-down path and a bottom-up path for feature fusion.

FeboReigns commented 2 years ago

Thank you for your guidance and I have benefited a lot.

CUKH-SYSU Recall AP mAP top-1 P3 90.3 81.2 93.1 93.4 P4 87.5 78.7 92.7 93.1 P5 79.0 71.7 89.3 89.5 P3, P4 90.4 80.5 91.1 91.6 P3, P4, P5 90.9 80.4 90.0 90.5

I would like to confirm the experimental situation of these data. If you use P3 and P4, I know that P3 and P4 are positive samples at the time of training, and the other layers are background. However, when testing, the output of the FPN is P3 and P4, or only one P3? It should be the former,But I'll ask for confirmation. 我想确认一下,这几个数据的实验情况。如果使用P3和P4,我知道训练时P3和P4 是正样本,其他层是背景。但是测试时FPN的输出是P3和P4,还是只有P3一个?应该是前者,但是我还是来问问确认一下吧。

I also tried, and the output of the FPN at training was only P3, which dropped by 2% on the PRW dataset. I think the reason this is possible is because it's like GLCNet said,background knowledge helps prospective learning. Thank you for providing the baseline, my master's thesis was intended to improve AlignPS. 另外我做了尝试,在训练时FPN的输出也只用P3,在PRW数据集上下降了2%。我认为之所以会这样可能时因为正像GLCNet 中说的,背景知识帮助了前景学习。