Questions about reproducing results on VID

YuHengsss / YOLOV

This repo is an implementation of PyTorch version YOLOV Series

Apache License 2.0

278 stars 39 forks source link

Questions about reproducing results on VID #79

Open DongdongY1 opened 2 months ago

DongdongY1 commented 2 months ago

Hello, I'm trying to train YOLOV to reproduce the result on VID. I think your workflow would be

Train coco-pretrained YOLOX by DET
connect YOLOX with FAM and train by VID

However, if the provided YOLOX weights was used to be the baseline, I suppose it should already trained on DET&VID. Then if I download the YOLOX weight and run the cmd

python tools/vid_train.py -f exps/yolov/yolov_s.py -c weights/yoloxs_vid.pth --fp16

Is it means that the model would train on DET&VID twice? And, running this line results in a 7-epoch training, which seems to be DET training phase. Will VID phase automatically start following it?

DongdongY1 commented 2 months ago

I also noticed that there is a time embedding generating function which is not utilized. In the paper you mentioned "The positional information is not embedded, because the locations in a long temporal range would not be helpful as claimed in (Chen et al. 2020)." I'm wondering if you've done experiments to verify this, as I am trying to add time embedding to further improve the model.

DongdongY1 commented 2 months ago

Update: I finished training using the cmd above and got 76.27%mAP which I think is roughtly the same to the table.

YuHengsss commented 2 months ago

Update: I finished training using the cmd above and got 76.27%mAP which I think is roughtly the same to the table.

try setting reference frame number to 32, you may get a higher AP50.

DongdongY1 commented 2 months ago

Update: I finished training using the cmd above and got 76.27%mAP which I think is roughtly the same to the table.

try setting reference frame number to 32, you may get a higher AP50.

Thanks, still I have confusions as above. Could you give some explanations?