RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

jayshent commented 3 months ago

Hi all, I tried to execute the train.py file but encountered the issue below.

Appreciate your inputs in advance

$ python3 train.py --batch-size 32 --cfg cfg/yolov3.cfg --data data/coco.data --weights '' Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=300, evolve=False, freeze_layers=False, img_size=[320, 640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='') Using CUDA device0 _CudaDeviceProperties(name='NVIDIA A100 80GB PCIe MIG 3g.40gb', total_memory=40448MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/ WARNING: smart bias initialization failure. WARNING: smart bias initialization failure. WARNING: smart bias initialization failure. Model Summary: 222 layers, 6.19491e+07 parameters, 6.19491e+07 gradients Optimizer groups: 75 .bias, 75 Conv2d.weight, 72 other Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/train2014.npy (117264 found, 0 missing, 0 empty, 4514 duplicat Caching labels /home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/data/coco/labels/val2014.npy (4954 found, 0 missing, 0 empty, 197 duplicate, fo Image sizes 320 - 640 train, 640 test Using 8 dataloader workers Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size

0%| | 0/3665 [00:15<?, ?it/s] Traceback (most recent call last): File "train.py", line 435, in train(hyp) # train normally File "train.py", line 283, in train loss, loss_items = compute_loss(pred, targets, model) File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 356, in compute_loss tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets File "/ibm/gpfs/home/jayshen26/jay_workspace_2/PyTorch-Spiking-YOLOv3/utils/utils.py", line 441, in build_targets a, t = at[j], t.repeat(na, 1, 1)[j] # filter RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

jayshent commented 3 months ago

@cwq159 Could you provide me some debug direction? Thanks!

Shenyiyu1 commented 3 months ago

Can you solve the problem？

jayshent commented 3 months ago

Can you solve the problem？

Unfortunately no, you run into the same error?

cwq159 / PyTorch-Spiking-YOLOv3

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #65