Recommandation for Training on imbalanced dataset?

Megvii-BaseDetection / YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

Apache License 2.0

9.31k stars 2.19k forks source link

Recommandation for Training on imbalanced dataset? #1003

Open loucif01 opened 2 years ago

loucif01 commented 2 years ago

Hi, I want to know if there is any changes that we should set before training on imbalanced dataset?

Thank you.

FateScript commented 2 years ago

You may try to write your own sampler or reweighted loss or something else.

Dolpheyn commented 2 years ago

I personally used dataset level techniques (the "bag of freebies" augmentations from the YOLOv4 whitepaper) to combat the imbalance of example counts between labels. You will have to train longer (dataset grows after augmentation), but the performance improves across labels, which reflects overall performance.