YOLOV4-CSP： Questions of Training

jackydinosaur commented 3 years ago

Hello Wong： First of all, thank you for your help . I get errors when running train.py The command is worked. 【python train.py --device 0 --batch-size 8 --data coco.yaml --cfg yolov4-csp.cfg --weights '' --name yolov4-csp】

When it finished the first epoch,I stop it. I want to try resume training. 【python train.py --device 0 --batch-size 8 --data coco.yaml --cfg yolov4-csp.cfg --weights 'runs/train/yolov4-csp8/weights/last.pt' --name yolov4-csp --resume】 Error occurred.

However, the 'yolov4-csp.cfg' is existd.

Another error occurred, when I try to use multiple GPUs for training 【python -m torch.distributed.launch --nproc_per_node 4 train.py --device 0,1 --batch-size 16 --data coco.yaml --cfg ./cfg/yolov4-csp.cfg --weights '' --name yolov4-csp --sync-bn】

how might able to solve it?

I am looking forward to your reply.

WongKinYiu commented 3 years ago

【python train.py --device 0 --batch-size 8 --data coco.yaml --cfg yolov4-csp.cfg --weights 'runs/train/yolov4-csp8/weights/last.pt' --name yolov4-csp --resume】

need not to use --resume, and from the error message, it seems your command used --cfg .cfg, maybe just try again.

【python -m torch.distributed.launch --nproc_per_node 4 train.py --device 0,1 --batch-size 16 --data coco.yaml --cfg ./cfg/yolov4-csp.cfg --weights '' --name yolov4-csp --sync-bn】

--nproc_per_node should less than your gpu number, so use: 【python -m torch.distributed.launch --nproc_per_node 2 train.py --device 0,1 --batch-size 16 --data coco.yaml --cfg ./cfg/yolov4-csp.cfg --weights '' --name yolov4-csp --sync-bn】

jackydinosaur commented 3 years ago

Thank you for your answers.

jaideep11061982 commented 3 years ago

@WongKinYiu https://drive.google.com/file/d/1TdKvDQb2QpP4EhOIyks8kgT8dgI1iOWT/view

weights uploaded here are for which version of Yolov4 ,p5,p6, p7 ?

WongKinYiu commented 3 years ago

yolov4-csp

jaideep11061982 commented 3 years ago

@WongKinYiu i get this error I copied the latest version only after copied the above wts to the path

Traceback (most recent call last):

  File "detect.py", line 171, in <module>
    detect()
  File "detect.py", line 33, in detect
    model = attempt_load(weights, map_location=device)  # load FP32 model
  File "/kaggle/working/ScaledYOLOv4/models/experimental.py", line 137, in attempt_load
    model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
  File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.7/site-packages/torch/serialization.py", line 764, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x00'.

WongKinYiu / ScaledYOLOv4

YOLOV4-CSP： Questions of Training #267