WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.03k stars 4.12k forks source link

[BUG] [CPU training] RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. #359

Closed Roland-Pfeiffer closed 1 year ago

Roland-Pfeiffer commented 1 year ago

Hi, when I run train.py on a custom dataset, I am unable to train on cpu despite passing --device "cpu":

I am running


DS_ROOT="/path/to/datasets/ds_root/"
YOLOV7="/path/to/yolov7/"

cd "${YOLOV7}"
#wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

DS_YAML="${DS_ROOT}dataset.yaml"
YOLO_TRAIN="${YOLOV7}train.py"

python3 ${YOLO_TRAIN} --data ${DS_YAML} --weights "yolov7.pt" --device "cpu"

But receive the error

findux@findux-laptop:/media/findux/DATA/Code/Malta_II/bioblu/bioblu/yolo/yolo7$ /bin/bash /media/findux/DATA/Code/Malta_II/bioblu/bioblu/yolo/yolo7/training_local_test.sh
YOLOR šŸš€ v0.1-70-g4c207e1 torch 1.11.0+cu102 CPU

Namespace(adam=False, artifact_alias='latest', batch_size=16, bbox_interval=-1, bucket='', cache_images=False, cfg='', data='/media/findux/DATA/Documents/Malta_II/datasets/dataset_05_mini_gnejna/dataset.yaml', device='cpu', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs/train/exp14', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=16, upload_dataset=False, weights='yolov7.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.2, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.0, paste_in=0.15
Traceback (most recent call last):
  File "/media/findux/DATA/Documents/Malta_II/yolov7/train.py", line 609, in <module>
    train(hyp, opt, device, tb_writer)
  File "/media/findux/DATA/Documents/Malta_II/yolov7/train.py", line 71, in train
    run_id = torch.load(weights).get('wandb_id') if weights.endswith('.pt') and os.path.isfile(weights) else None
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load
    result = unpickler.load()
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 1016, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 176, in default_restore_location
    result = fn(storage, location)
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 152, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/media/findux/DATA/Code/000_venvs/yolov7/lib/python3.8/site-packages/torch/serialization.py", line 136, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Any help is greatly appreciated! Thank you!

mkhoshbin72 commented 1 year ago

I added PR #369 for this bug.