WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

Small object detection (DriveU Traffic Light Dataset) performance worse than YOLOv4-CSP #77

Closed yusiyoh closed 2 years ago

yusiyoh commented 2 years ago

Hi, thank you for your work and code. I am training models to detect traffic lights on DriveU Traffic Light Dataset (images are 2048x1024) in which the average width of the bounding boxes is 11 pixel, that means generally they are very small. I trained YOLOv4-CSP 50 epoch with 640x640 input size with pretrained weights from your repo. Then I trained YOLOR-D6 with 1280 input size and --rect option since images are 2048x1024. I put the results below. How can I improve the results for YOLOR-D6, it is much more complex and capable model than YOLOv4-CSP but performing worse in this case. results_all_obj.txt results_yolor.txt

WongKinYiu commented 2 years ago

from the results we can find that both of these two models are not yet converge. by the way, if use --rect for training, i suggest you to add random shuffle flag in data loader.

yusiyoh commented 2 years ago

How can we find out that they have not yet converged? I will take into account your suggestion. Is it reasonable to use --rect for 2048*1024 images? Or is it better not to use it even though the images are not square?

WongKinYiu commented 2 years ago

you can see the ap still growing up in the txt files. if converge, the ap will have almost no change in several epochs. if over-fitting, the ap will start become lower and lower. these two criteria are often used for stop training.

if you want to use --rect, you could add shuffle=True in dataloader.