Closed kuanzi closed 4 years ago
Hi @kuanzi, To test how good the embedding is, we hope to find ONE exact embedding vector, which is closest to a ground truth box center, to represent a person. If we use build_target_thres, embedding vectors at multiple locations would be assigned to a single gt box. This will lead to too many redundant embedding vectors, which is unnecessary and makes the retrieval test less challenging.
Hi @Zhongdao, Thanks for the explanation that made me understand why to use this method But I have further questions. During the training process, we should hope that a gt is responsible for an anchor, and this anchor will be trained to close to the gt (that is, a function similar to build_target_max should be implemented). If build_target_thres function is used during training(each anchor is looking for one gt), it will lead to two possible situations:
So I wonder if you should also use build_target_max during training?
Regarding the first issue, since we use build_target_thres during training, multiple anchors may be assigned with the same gt, and there hardly exists the situation that a certain gt does not have anchor correspondence. In contrast, the actual problem here is some assignment may be ambiguous (an anchor has equal overlaps w.r.t two gts (IOU>0.5) but assigned to one of them). This issue has not been solved yet. Regarding the second question, we find that assigning multiple anchors to one gt largely improves the recall of the detection branch. This effect may be not obvious in generic object detection, but we find it very important in the pedestrian scenario. This is the main reason why we use build_target_thres in the detection branch. As for the embedding branch, we find build_target_thres and build_target_max lead to similar performance. We guess build_target_thres introduces more training vectors for the embedding branch, and the performance gain from this mitigates the negative effect brought by inaccurate anchor assignments.
These are very good questions, thank you for pointing them out so that I have a chance to explain!
Thanks very much for your patience. This problem has troubled me for a long time, and I am enlightened until your explanation. Thank you again!
Hello, Thanks for your work and the open-source. I have some questions:
why you use different functions to build the target, as you use build_target_thres function for regular training, but use build_target_max for embedding test. I have gone into the code, finding that:
and then I check other implementations for yolov3, finding they keep the max_iou anchor for every gt in both training and inference procedures. So could you please explain why you coded this way?