How to achieve base detector weight pretrained by ILSVRC2015 and ILSVRC

YuHengsss / YOLOV

This repo is an implementation of PyTorch version YOLOV Series

Apache License 2.0

323 stars 46 forks source link

How to achieve base detector weight pretrained by ILSVRC2015 and ILSVRC #103

Open yuyangyangji opened 3 weeks ago

yuyangyangji commented 3 weeks ago

thanks for the great work of yolov and yolov++, in this repo, you mention how to fine-tune the model from a ptr-trained yolox model. However, i find that in the paper, base detector is trained with ILSVRC2015 and ILSVRC dataset, i wonder whether this repo provide the code for us to achieve the pre-trainined weight? thanks, hope for your answering!

YuHengsss commented 3 weeks ago

Thanks for your interst in our work. We indeed provide code to train the base detector. Take the imagenet vid as an example, the experiment file to train a base detector (e.g. yolox-s) could be found here: https://github.com/YuHengsss/YOLOV/blob/0a510c3177bdf2b77cb502df589307065e90425d/exps/yolov/yoloxs_vid.py#L6

You could use tools/train.py and the experiment file to train the base detector. Their usage is same as YOLOX.

yuyangyangji commented 3 weeks ago

thanks! for the training stage, first stage is to start with coco pretrained weights，freeze backbone, only fine-tune linear projection layers in YOLOX prediction head using sampled ILSVRC2015 and ILSVRC dataset. second stage is to use full ILSVRC2015, freeze backbone and fine-tune prediction head and newly added video object classification branch and the FAM. Does this description has bias with the origin paper? I wonder weather the first stage training need to train FAM module and new added classification branch, hope for your answering! thanks!

YuHengsss commented 3 weeks ago

Hello, you need to finetune all the coco pretrained weights in the first stage, NOT only the linear projection head. The procedure of the second stage is correct.

yuyangyangji commented 1 week ago

hello, now i am successfully train the yolov++ and i have some question about feature select module, in the paper you mentioned that we use a threshold to pick up which proposals to be select to apply FAM module, and the number each frame is always under 100 per frame, however, when i am training in second stage mention above(v++ base decoupledreg_2x version), i find that if we use default setting of repo, the number of proposal we chose is exactly very high, i observe that more than 70% of the proposals are selected, is it within expectation? how about if i directly use the proposals selected by sim-ota per frame? hope for your answering! thanks!

YuHengsss commented 1 week ago

This phenomenon is intriguing. The number of candidates selected by the feature selection module depends on both the quality of the base detector and the characteristics of the image. Could you provide more details about the dataset you are using? With such a large number of candidates, the GPU memory cost will be extremely high and expect to meet a OOM error. Additionally, the average proposal number reported in Table 2 of our paper represents an average value, not the minimum.