david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies
MIT License
638 stars 220 forks source link

How to train yolo3-tiny-darknet with data imbalace? #15

Open govindamagrawal opened 4 years ago

govindamagrawal commented 4 years ago

Hi David, I am trying to do the face detection as well as detecting other object but majorly person class and may be some other classes like cats, dogs etc. In order to do that, I am combining the face detection dataset(WIDE FACE Dataset), along with VOC dataset but the issue I am facing is in face detection dataset, there are 32,203 images and label 393,703 faces. These labelled faces are too high compared to images either in VOC or COCO dataset, when compared to person class. When I trained the model, I found that the trained model is detecting only faces and not other classes. I want to train in maximum dataset possible, but again, increasing the dataset size by large number(oversampling) significantly increases the training time. So, can you suggest some ways how to train effectively with the dataset, like modifying some parameters of the loss function or considering some other loss function or some other methods? Thanks in advance.

david8862 commented 4 years ago

Hi David, I am trying to do the face detection as well as detecting other object but majorly person class and may be some other classes like cats, dogs etc. In order to do that, I am combining the face detection dataset(WIDE FACE Dataset), along with VOC dataset but the issue I am facing is in face detection dataset, there are 32,203 images and label 393,703 faces. These labelled faces are too high compared to images either in VOC or COCO dataset, when compared to person class. When I trained the model, I found that the trained model is detecting only faces and not other classes. I want to train in maximum dataset possible, but again, increasing the dataset size by large number(oversampling) significantly increases the training time. So, can you suggest some ways how to train effectively with the dataset, like modifying some parameters of the loss function or considering some other loss function or some other methods? Thanks in advance.

Hi @govindamagrawal, maybe you can try to use the CategoricalCrossentropy loss from tf.keras for the YOLO classification loss, which support "sample_weight" for the loss and could balance the classes. You can adjust the class weights to deal with the data imbalance

govindamagrawal commented 4 years ago

Thanks for the reply. I will try it out.