Why do you use val data during training?

muzishen commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/eb1d600d0daa3f9c433ab35ffbfa0e2be1961ef8/tools/vid_train.py#L116

YuHengsss commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/eb1d600d0daa3f9c433ab35ffbfa0e2be1961ef8/tools/vid_train.py#L116

As you can see in vid_trainer.py, this sequence is used for validation.

muzishen commented 1 year ago

So where is the training data？

YuHengsss commented 1 year ago

So where is the training data？ The training data loader is defined in exp files (e.g. yolov_s.py; same as yolox)

muzishen commented 1 year ago

For training YOLOX, i have configured files in exps/yolov/yoloxs_vid.py, and run the script in tools/train.py to train YOLOX. But for training YOLOV, in 'exps/yolov/yolov_s.py', i don not find the label for the training data. https://github.com/YuHengsss/YOLOV/blob/eb1d600d0daa3f9c433ab35ffbfa0e2be1961ef8/exps/yolov/yolov_s.py#L20 L20 and L21 all are val label.

Can you tell me exactly what line it is.

YuHengsss commented 1 year ago

For training YOLOX, i have configured files in exps/yolov/yoloxs_vid.py, and run the script in tools/train.py to train YOLOX. But for training YOLOV, in 'exps/yolov/yolov_s.py', i don not find the label for the training data.

https://github.com/YuHengsss/YOLOV/blob/eb1d600d0daa3f9c433ab35ffbfa0e2be1961ef8/exps/yolov/yolov_s.py#L20

L20 and L21 all are val label. Can you tell me exactly what line it is.

Line 119, get_data_loader, we change the dataloader and the coco annotation is not used for video object detector training. Sorry for the misunderstanding

YuHengsss commented 1 year ago

If you find any problems, please feel free to issue

muzishen commented 1 year ago

If you find any problems, please feel free to issue

Thank you very much.

muzishen commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/yolox/data/datasets/vid.py#L119 I'm sorry to bother you again.
I think list ‘res’ holds all the images, including training and testing, right? Why is L119 [:1000] used for validation and L122 [:15000] used for training,. so that the verified data is in the trained data?

YuHengsss commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/yolox/data/datasets/vid.py#L119

I'm sorry to bother you again. I think list ‘res’ holds all the images, including training and testing, right? Why is L119 [:1000] used for validation and L122 [:15000] used for training,. so that the verified data is in the trained data?

Actually, there are two VIDDataset objects, one for validation defined in vid_train.py and another defined in exps. They are distinguished by the val parameter and file_path parameter. So the contents in 'res' are different. When val=True, res holds validation sequence otherwise it holds training sequence.

muzishen commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/yolox/data/datasets/vid.py#L122

Thanks. why only take the first 1000 images for validation. Only use the first 15,000 images for training?

YuHengsss commented 1 year ago

https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/yolox/data/datasets/vid.py#L122

Thanks. why only take the first 1000 images for validation. Only use the first 15,000 images for training?

With batch size = 16, there will be 16000 for validation and 16X15000 for training. By passing tnum=-1, almost all images will be used for validation but it's time-consuming (17.6w images, 30~40 minutes for xlarge mode and 20GB+ memory), so we use part of them for roughly testing. For final testing, we make full use of all images in validation and convert them to IMDB format following FGFA which can be found in tools/val_to_imdb.py .

muzishen commented 1 year ago

The paper report that 'we adopt the global sampling strategy with f_g = 31 by default for the rest experiments', but the code set f_g=16 by default, why? https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/tools/vid_train.py#L94

YuHengsss commented 1 year ago

The paper report that 'we adopt the global sampling strategy with f_g = 31 by default for the rest experiments', but the code set f_g=16 by default, why?

https://github.com/YuHengsss/YOLOV/blob/16302d845262ab1914eba4eca8325a4aeee9e88e/tools/vid_train.py#L94

As described in our paper, ''For training the feature aggregation module, the number of frames f is set to 16" in the implementation details part. For testing, we adopt 31 frames after ablation.

YuHengsss / YOLOV

Why do you use val data during training? #5