WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
GNU General Public License v3.0
9.01k stars 1.43k forks source link

poor box width regression on text detection #518

Open travisCxy opened 4 months ago

travisCxy commented 4 months ago

hello, thank you for your code. I am training a yolov9-model for document image layout detection。I got a good map on my validate set。But the question is text detection some time got a bad width regression。can u help me? 0cdd69db56714fbc89b8845eb3f6e11f_sm_yolov9

ankandrew commented 4 months ago

Some questions:

  1. Did you try diff input resolution than 640, i.e. lower 416?
  2. How big (# samples) is your training data?
  3. Which model are you using, is it pre-trained with COCO (weights provided by repo)?

Also, double check that mixup augmentation is not ruining your training. Try seeing if augmentation is what you expect. Below is a script I use to visualize the augmentation:

https://github.com/ankandrew/yolov9/blob/8fecc650bebf7348a6372f43b668b344de070129/visualize_augmentation.py

travisCxy commented 4 months ago

@ankandrew hello

  1. i am using a bigger size 1024 for training my model, because the original document image is all high resolution
  2. I have 44000 training data, i think it is enough to train the model
  3. I am using yolov9-e and load the pretrained weights with coco I check my augmentation, you are right, i didnt close the mixup augmentation. I check the augmentation using your scipts, than i close mosaic and copy_paste, i will train one more time with current setting. by the way, i reading the code about compute loss. the bbox loss mainly focous on iou, I have doubt with the iou loss is not helpful for accurate bbox regression. So i change the loss to l1 loss, but I got a worse result, do you have any idea?
ankandrew commented 4 months ago

Hi @travisCxy! Sorry for late response. I think your analysis on point (3) seems accurate. Seems existing MDPIoU loss could be used instead of currently one used CIoU. The MDPIoU includes a penalty term based on the distance between the corners of the bounding boxes, which should make it more suitable for text detection where corner alignment is critical to avoid cropping letters (like in your examples). Let me know if this helps in your dataset.

https://github.com/WongKinYiu/yolov9/blob/5b1ea9a8b3f0ffe4fe0e203ec6232d788bb3fcff/utils/metrics.py#L292-L296

You can use my branch to select the bounding box loss function or cherry pick my commit to easily test other loss functions than default one https://github.com/ankandrew/yolov9/commit/9527269002a87a448091ac13cc041972c2e40caa.

eslamahmed235 commented 1 month ago

Hi @travisCxy did close_mosaic solve this issue? and if you each for the root cause of this problem?