Smaller model with larger image network, or larger model with smaller image network?

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Other

21.77k stars 7.96k forks source link

I am training a model for deployment on an edge device, and the model needs to detect 6 classes (person, bicycle, e-bike with pedal, e-bike without pedal, scooter, wheelchair) in a variety of environments. I initially went with a scaled yolov4, but due to the restrictions on the edge device, had to restrict a network size. I was wondering, since the model complexity requirements scale with number of different objects, and I only have 6 classes, would it be better to train using a smaller model (e.g. yolov4) but with a higher image resolution? Is there a general guideline in deciding this tradeoff?

Also, can I clarify what does random=1 exactly do? What does it mean to randomly resize? Is it the network being resized here? How would that work during inference (I thought it had to be constant).

Lastly, the repo also mentions this recommendation for improving networks: to make the detected bounded boxes more accurate, you can add 3 parameters ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou to each [yolo] layer and train, it will increase mAP@0.9, but decrease mAP@0.5. What does the "accuracy" exactly mean here?

You should use yolov4-tiny.cfg or yolov4-tiny-3l.cfg as your base and should train at inference size for best results. See the Scaled-YOLOv4 paper.

"random=1" turns on image size randomisation during training with the range of /1.4 to *1.4. It is a training only parameter.

Accuracy here means different ways of classifying predictions as "accurate" or "not accurate". The "@0.XX" part here refers to the threshold measured in Intersection over Union (IoU), i.e. at "mAP@0.50" a prediction is considered accurate if the intersection between actual object and predicted bounding box has an area larger than 50% the area of the union of the two.
At "mAP@0.90" it is only considered accurate if the intersection is larger than 90%, but achieving this is more difficult :Ü™

AlexeyAB / darknet

Smaller model with larger image network, or larger model with smaller image network? #8283