WongKinYiu / YOLO

An MIT rewrite of YOLOv9
MIT License
292 stars 19 forks source link

An error occurred while attempting to resize images for training #32

Open qpal147147 opened 2 weeks ago

qpal147147 commented 2 weeks ago

Describe the bug

  1. I encountered an error when attempting to resize an image from 640 to 1280 . This is the error message:

    Traceback (most recent call last):
    File "/yolov9/yolo/lazy.py", line 34, in main
    trainer.solve(dataloader)
    File "/yolov9/yolo/tools/solver.py", line 112, in solve
    epoch_loss = self.train_one_epoch(dataloader)
    File "/yolov9/yolo/tools/solver.py", line 79, in train_one_epoch
    loss, loss_each = self.train_one_batch(images, targets)
    File "/yolov9/yolo/tools/solver.py", line 66, in train_one_batch
    loss, loss_item = self.loss_fn(aux_predicts, main_predicts, targets)
    File "/yolov9/yolo/tools/loss_functions.py", line 124, in __call__
    aux_iou, aux_dfl, aux_cls = self.loss(aux_predicts, targets)
    File "/yolov9/yolo/tools/loss_functions.py", line 91, in __call__
    align_targets, valid_masks = self.matcher(targets, (predicts_cls, predicts_box))
    File "/yolov9/yolo/utils/bounding_box_utils.py", line 243, in __call__
    target_matrix = grid_mask * (iou_mat ** self.factor["iou"]) * (cls_mat ** self.factor["cls"])
    RuntimeError: The size of tensor a (33600) must match the size of tensor b (8400) at non-singleton dimension 2
  2. When I use wandb I get the following error:

    Traceback (most recent call last):
    File "/yolov9/yolo/lazy.py", line 34, in main
    trainer.solve(dataloader)
    File "/yolov9/yolo/tools/solver.py", line 115, in solve
    self.validator.solve(self.validation_dataloader)
    File "/yolov9/yolo/tools/solver.py", line 194, in solve
    self.progress.start_one_epoch(len(dataloader))
    File "/yolov9/yolo/utils/logging_utils.py", line 69, in start_one_epoch
    lr_values = [params["lr"] for params in optimizer.param_groups]
    AttributeError: 'NoneType' object has no attribute 'param_groups'

To Reproduce

My training command:

python lazy.py task=train task.data.batch_size=32 image_size=[1280,1280] device=cuda use_wandb=True

Expected behavior

This command should work as expected. BTW, when I adjust class_num to 1, the pretrained weights fail to load.

https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/utils/bounding_box_utils.py#L176 When the image size is 1280, the shapes of grid_mask and iou_mat are 33600, while the shape of cls_mat is fixed at 8400. Is there any special meaning behind this?

https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/utils/logging_utils.py#L69 https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/tools/solver.py#L194

I modified it to the following and it worked

if self.use_wandb and optimizer is not None:

System Info (please complete the following ## information):

Additional context

Thank you for your excellent work on the project. I apologize for any errors in my English.