An error occurred while attempting to resize images for training

Describe the bug

I encountered an error when attempting to resize an image from 640 to 1280 . This is the error message:

Traceback (most recent call last):
File "/yolov9/yolo/lazy.py", line 34, in main
trainer.solve(dataloader)
File "/yolov9/yolo/tools/solver.py", line 112, in solve
epoch_loss = self.train_one_epoch(dataloader)
File "/yolov9/yolo/tools/solver.py", line 79, in train_one_epoch
loss, loss_each = self.train_one_batch(images, targets)
File "/yolov9/yolo/tools/solver.py", line 66, in train_one_batch
loss, loss_item = self.loss_fn(aux_predicts, main_predicts, targets)
File "/yolov9/yolo/tools/loss_functions.py", line 124, in __call__
aux_iou, aux_dfl, aux_cls = self.loss(aux_predicts, targets)
File "/yolov9/yolo/tools/loss_functions.py", line 91, in __call__
align_targets, valid_masks = self.matcher(targets, (predicts_cls, predicts_box))
File "/yolov9/yolo/utils/bounding_box_utils.py", line 243, in __call__
target_matrix = grid_mask * (iou_mat ** self.factor["iou"]) * (cls_mat ** self.factor["cls"])
RuntimeError: The size of tensor a (33600) must match the size of tensor b (8400) at non-singleton dimension 2

When I use wandb I get the following error:

Traceback (most recent call last):
File "/yolov9/yolo/lazy.py", line 34, in main
trainer.solve(dataloader)
File "/yolov9/yolo/tools/solver.py", line 115, in solve
self.validator.solve(self.validation_dataloader)
File "/yolov9/yolo/tools/solver.py", line 194, in solve
self.progress.start_one_epoch(len(dataloader))
File "/yolov9/yolo/utils/logging_utils.py", line 69, in start_one_epoch
lr_values = [params["lr"] for params in optimizer.param_groups]
AttributeError: 'NoneType' object has no attribute 'param_groups'

To Reproduce

My training command:

python lazy.py task=train task.data.batch_size=32 image_size=[1280,1280] device=cuda use_wandb=True

Expected behavior

This command should work as expected. BTW, when I adjust class_num to 1, the pretrained weights fail to load.

For question 1, the triggering reasons are as follows:

https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/utils/bounding_box_utils.py#L176 When the image size is 1280, the shapes of grid_mask and iou_mat are 33600, while the shape of cls_mat is fixed at 8400. Is there any special meaning behind this?

For question 2, it is caused by the fact that the optimizer parameters are not passed in during the validation phase:

https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/utils/logging_utils.py#L69 https://github.com/WongKinYiu/YOLO/blob/010502a003461bfa657ce08e584680076f0fb837/yolo/tools/solver.py#L194

I modified it to the following and it worked

if self.use_wandb and optimizer is not None:

System Info (please complete the following ## information):

OS: Ubuntu 22.04
Python Version: 3.9.19
PyTorch Version: 2.1.0
CUDA/cuDNN/MPS Version: CUDA 11.3
YOLO Model Version: YOLOv9-m

Additional context

Thank you for your excellent work on the project. I apologize for any errors in my English.

WongKinYiu / YOLO

An error occurred while attempting to resize images for training #32

Describe the bug

To Reproduce

Expected behavior

System Info (please complete the following ## information):

Additional context