The problem of using yolov's algorithm ideas in yolov5/v7

YuHengsss / YOLOV

This repo is an implementation of PyTorch version YOLOV Series

Apache License 2.0

323 stars 46 forks source link

The problem of using yolov's algorithm ideas in yolov5/v7 #93

Open CangHaiQingYue opened 2 months ago

CangHaiQingYue commented 2 months ago

Hello, thanks for your great job! YOLOX is based on an anchor-free algorithm, and I would like to use YOLOX's ideas in the anchor-based algorithm. Now I have a question that I would like to ask:

Currently, self. n_anchors=1. For an image with an input size of 1x3x640x640, the shape of feature_cls should be 1x8040x192, while the range of values in pred_idx is [0, 8039] This is okay.

However, when self. n_anchors=3, the shape of features_cls is still 1x8040x192, but the value of pred_idx is [0, 8040 * 3-1], then an error will be reported in the function self. find_feature_store.

So I would like to ask how to resolve this conflict. It seems impractical to simply repeat features_cls three times. https://github.com/YuHengsss/YOLOV/blob/2ea4eb90a44cb3db791c1ac5aac38685ebbc297c/yolox/models/yolovp_msa.py#L290-L311

YuHengsss commented 2 months ago

Thanks for your attention to our work!

To use YOLOV in detector with multiple anchors, you should rewrite the feature selection function(https://github.com/YuHengsss/YOLOV/blob/2ea4eb90a44cb3db791c1ac5aac38685ebbc297c/yolox/models/yolovp_msa.py#L307). Concretely, find the foreground proposals and their corresponding features. Note that multiple foreground proposals may correspond to one feature point. In such cases, one feature will be repeated multiple times. Given this concern, we choose the anchor-free detector to conduct our experiment. However, our strategy should also work in these anchor-based detectors. Any attempt is appreciated and we look forward to hearing your success!

CangHaiQingYue commented 2 months ago

Hi @YuHengsss , thanks for your answer, I solved this by mapping [0, 8040 * 3-1] to [0, 8040-1], Cause 3 repersent anchor number, which means one point had been repeated three times.

However, I encountered another problem, which is about ref_loss. I didn't see this part in paper. Can you explain this part in detail for me?

YuHengsss commented 2 months ago

This is a classification refinement loss for the video object detection, and you can also find the IoU score refinement loss if you use YOLOV++. They are intended to optimize the classification and confidence of the object after the temporal refinement block. You can find the assignment strategy used in YOLOV for classification part as follows: https://github.com/YuHengsss/YOLOV/blob/2ea4eb90a44cb3db791c1ac5aac38685ebbc297c/yolox/models/yolovp_msa.py#L590C21-L605C1

YOLOV++ updated the label assignment strategy to get better performance. It's a little bit complex and you can find them in https://github.com/YuHengsss/YOLOV/blob/2ea4eb90a44cb3db791c1ac5aac38685ebbc297c/yolox/models/v_plus_head.py#L1052 and https://github.com/YuHengsss/YOLOV/blob/2ea4eb90a44cb3db791c1ac5aac38685ebbc297c/yolox/models/v_plus_head.py#L447