WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

indices should be either on cpu or on the same device as the indexed tensor (cpu) #268

Closed GabrielFerrante closed 1 year ago

GabrielFerrante commented 1 year ago

I'm have a training error: Command used:

python3 train.py --batch-size 16 --img 416 416 --data Dataset.yaml --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --device 0 --name yoloR-BRA-Dataset --hyp hyp.scratch.1280.yaml --epochs 351

Output ERROR:

Using torch 1.13.0.dev20220622 CUDA:0 (NVIDIA GeForce RTX 3060, 12046MB)

Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='cfg/yolor_p6.cfg', data='BRA-Dataset.yaml', device='0', epochs=351, evolve=False, exist_ok=False, global_rank=-1, hyp='./data/hyp.scratch.1280.yaml', image_weights=False, img_size=[416, 416], local_rank=-1, log_imgs=16, multi_scale=False, name='yoloR-BRA-Dataset', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=False, resume=False, save_dir='runs/train/yoloR-BRA-Dataset', single_cls=False, sync_bn=False, total_batch_size=16, weights='yolor_p6.pt', workers=8, world_size=1) Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/ Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0} Model Summary: 665 layers, 36860016 parameters, 36860016 gradients Transferred 850/862 items from yolor_p6.pt Optimizer groups: 145 .bias, 145 conv.weight, 149 other WARNING: --img-size 416 must be multiple of max stride 64, updating to 448 WARNING: --img-size 416 must be multiple of max stride 64, updating to 448 Scanning labels images/train.cache3 (1474 found, 0 missing, 0 empty, 0 duplicate, for 1474 images): 1474it [00:00, 18114.65it/s] Scanning labels images/val.cache3 (349 found, 0 missing, 0 empty, 0 duplicate, for 349 images): 349it [00:00, 15900.63it/s] Image sizes 448 train, 448 test Using 8 dataloader workers Logging results to runs/train/yoloR-BRA-Dataset Starting training for 351 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

0%| | 0/93 [00:02<?, ?it/s] Traceback (most recent call last): File "train.py", line 537, in train(hyp, opt, device, tb_writer, wandb) File "train.py", line 288, in train loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size File "/media/usp/DATA/GabrielSFerrante/PROJETO/DetectAnimalsInRoads/YoloR/yolor/utils/loss.py", line 66, in compute_loss tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets File "/media/usp/DATA/GabrielSFerrante/PROJETO/DetectAnimalsInRoads/YoloR/yolor/utils/loss.py", line 148, in build_targets a, t = at[j], t.repeat(na, 1, 1)[j] # filter RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

GabrielFerrante commented 1 year ago

Solution: Using torch 1.7.1+cu110. CUDA 11.4

amishra791 commented 1 year ago

Solution: Using torch 1.7.1+cu110. CUDA 11.4

Hi, what exactly was your solution to this issue?