Error during fine tuning

Hi,

Thank you for providing your code for YOLOv7 training. It's very intuitive. While trying to fine-tune with my own dataset, I ran into the below error, which I believe may not be related to my changes. I setup the env using your requirements file and I am running on a GCP machine with a Tesla T4. Below is the snippet of the error. Please let me know if this is a known issue or possible cause. Thanks in advance.

Transferred 555/566 items from https://github.com/Chris-hughes10/Yolov7-training/releases/download/0.1.0/yolov7_training_state_dict.pt

Starting training run

Starting epoch 1
  0%|                                                                                                   | 0/1013 [00:00<?, ?it/s]
torch.Size([9, 6]) 4.  <--- output of print(image_preds.shape, PredIdx.OBJ)
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:367: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
    main()
  File "/opt/conda/envs/yolov7/lib/python3.9/site-packages/func_to_script/core.py", line 108, in scripted_function
    return func(**args)
  File "/home/varshanth/yolov7/fine_tune.py", line 77, in main
    trainer.train(
  File "/opt/conda/envs/yolov7/lib/python3.9/site-packages/pytorch_accelerated/trainer.py", line 467, in train
    self._run_training()
  File "/opt/conda/envs/yolov7/lib/python3.9/site-packages/pytorch_accelerated/trainer.py", line 676, in _run_training
    self._run_train_epoch(self._train_dataloader)
  File "/opt/conda/envs/yolov7/lib/python3.9/site-packages/pytorch_accelerated/trainer.py", line 749, in _run_train_epoch
    self._perform_forward_and_backward_passes(batch)
  File "/opt/conda/envs/yolov7/lib/python3.9/site-packages/pytorch_accelerated/trainer.py", line 776, in _perform_forward_and_backward_passes
    batch_output = self.calculate_train_batch_loss(batch)
  File "/home/varshanth/yolov7/yolov7/trainer.py", line 107, in calculate_train_batch_loss
    loss, _ = self.loss_func(
  File "/home/varshanth/yolov7/yolov7/loss.py", line 204, in __call__
    box_loss, obj_loss, cls_loss = self._compute_losses(
  File "/home/varshanth/yolov7/yolov7/loss.py", line 237, in _compute_losses_for_train
    anchor_boxes_per_layer, targets_per_layer = self.simOTA_assignment(
  File "/home/varshanth/yolov7/yolov7/loss.py", line 619, in simOTA_assignment
    pred_objectness = image_preds[:, [PredIdx.OBJ]]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  0%|                                                                                                   | 0/1013 [00:03<?, ?it/s

Chris-hughes10 / Yolov7-training

Error during fine tuning #8