WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

RuntimeError: result type Float can't be cast to the desired output type long int #270

Open reich208github opened 1 year ago

reich208github commented 1 year ago

hi, my friend, how are you today:)

i just try to take a training with coco data set by the script like this:

python train.py --batch-size 1 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6_custom --hyp data/hyp.scratch.1280.yaml --epochs 300

but I get a runtime error like this:

Traceback (most recent call last): File "/home/cxkj/yolor-main/train.py", line 537, in train(hyp, opt, device, tb_writer, wandb) File "/home/cxkj/yolor-main/train.py", line 288, in train loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size File "/home/cxkj/yolor-main/utils/loss.py", line 66, in compute_loss tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets File "/home/cxkj/yolor-main/utils/loss.py", line 167, in buildtargets indices.append((b, a, gj.clamp(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
RuntimeError: result type Float can't be cast to the desired output type long int

here is a complete screen shot:

(yolor-gpu) root@cxkj-System-Product-Name:/home/cxkj/yolor-main# python train.py --batch-size 1 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6_custom --hyp data/hyp.scratch.1280.yaml --epochs 300 Using torch 1.12.0 CUDA:0 (NVIDIA GeForce RTX 3090, 24265MB)

Namespace(weights='', cfg='cfg/yolor_p6.cfg', data='data/coco.yaml', hyp='data/hyp.scratch.1280.yaml', epochs=300, batch_size=1, img_size=[1280, 1280], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='0', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, log_imgs=16, workers=8, project='runs/train', name='yolor_p6_custom', exist_ok=False, total_batch_size=1, world_size=1, global_rank=-1, save_dir='runs/train/yolor_p6_custom') Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/ Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0} Model Summary: 665 layers, 37265016 parameters, 37265016 gradients Optimizer groups: 145 .bias, 145 conv.weight, 149 other Scanning labels /home/cxkj/yolor-main/coco/labels/train2017.cache3 (117266 found, 0 missing, 1021 empty, 0 duplicate, for 118287 images): 118287it [00:05, 19842.91it/s] Scanning labels /home/cxkj/yolor-main/coco/labels/val2017.cache3 (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 5000it [00:00, 19607.06it/s] Image sizes 1280 train, 1280 test Using 0 dataloader workers Logging results to runs/train/yolor_p6_custom Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

0%| | 0/118287 [00:03<?, ?it/s] Traceback (most recent call last): File "/home/cxkj/yolor-main/train.py", line 537, in train(hyp, opt, device, tb_writer, wandb) File "/home/cxkj/yolor-main/train.py", line 288, in train loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size File "/home/cxkj/yolor-main/utils/loss.py", line 66, in compute_loss tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets File "/home/cxkj/yolor-main/utils/loss.py", line 167, in buildtargets indices.append((b, a, gj.clamp(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
RuntimeError: result type Float can't be cast to the desired output type long int

could you help me for this problem, thanks a lot and have a good day!

ItzDerock commented 1 year ago

I was running into this issue too,

Looks like an issue with compatibility with torch 1.12 (see this yolov5 issue)

If you are able to, downgrading to torch==1.11 or in some cases, torch==1.7.0 should fix this issue.

I wasn't able to downgrade since the older torch version didn't support my graphics card. If this is the case for you, take a look at my fork which resolves this issue.

S-Gaurisankar commented 6 months ago

This might be a bit too late but if you're perhaps working on this by any chance, do refer to this: [https://github.com/S-Gaurisankar/yolor_fixed/tree/paper]. I've fixed the runtime issues.