YuHengsss / YOLOV

This repo is an implementation of PyTorch version YOLOV Series
Apache License 2.0
278 stars 39 forks source link

why use only part of the data? #42

Closed xiaowk5516 closed 1 year ago

xiaowk5516 commented 1 year ago

Hello, thx for your great work. I notice that you use only part of the data in dataloader. image

does this mean “use 10% of VID”?short-time video may not appear in the dataset.

YuHengsss commented 1 year ago

Hello, "10% of VID datasets" is the setting to train the base detector. The "15000" is the sequence number for training the video object detector, which is about 1/4 data of the whole dataset.

YuHengsss commented 1 year ago

May boost the accuracy a little, taking all data for training costs soooo much time for the VID Dataset.

xiaowk5516 commented 1 year ago

Hello, "10% of VID datasets" is the setting to train the base detector. The "15000" is the sequence number for training the video object detector, which is about 1/4 data of the whole dataset.

thx for reply.

do you train the base detector using 10% of VID and DET detaset or only 10% of DET dataset? The "tnum" means using part of val dataset to calculate the performance of model. is that right?

xiaowk5516 commented 1 year ago

May boost the accuracy a little, taking all data for training costs soooo much time for the VID Dataset.

Got it. I have tried using all data training yolov6. 4 hours per epoch. hahhaha

YuHengsss commented 1 year ago

Hello, "10% of VID datasets" is the setting to train the base detector. The "15000" is the sequence number for training the video object detector, which is about 1/4 data of the whole dataset.

thx for reply.

do you train the base detector using 10% of VID and DET detaset or only 10% of DET dataset? The "tnum" means using part of val dataset to calculate the performance of model. is that right?

Yes. 10% VID data. The video part is redundant.

xiaowk5516 commented 1 year ago

Hello, "10% of VID datasets" is the setting to train the base detector. The "15000" is the sequence number for training the video object detector, which is about 1/4 data of the whole dataset.

thx for reply. do you train the base detector using 10% of VID and DET detaset or only 10% of DET dataset? The "tnum" means using part of val dataset to calculate the performance of model. is that right?

Yes. 10% VID data. The video part is redundant.

thanks a lot