WongKinYiu / ScaledYOLOv4

Scaled-YOLOv4: Scaling Cross Stage Partial Network
GNU General Public License v3.0
2.02k stars 572 forks source link

Error in training _pickle.UnpicklingError: STACK_GLOBAL requires str #309

Open ghost opened 3 years ago

ghost commented 3 years ago

train(hyp, opt, device, tb_writer) File "train.py", line 151, in train world_size=opt.world_size) File "/content/ScaledYOLOv4/utils/datasets.py", line 60, in create_dataloader pad=pad) File "/content/ScaledYOLOv4/utils/datasets.py", line 337, in init cache = torch.load(cache_path) # load File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: STACK_GLOBAL requires str CPU times: user 105 ms, sys: 15.2 ms, total: 120 ms

amkonshin commented 3 years ago

train(hyp, opt, device, tb_writer) File "train.py", line 151, in train world_size=opt.world_size) File "/content/ScaledYOLOv4/utils/datasets.py", line 60, in create_dataloader pad=pad) File "/content/ScaledYOLOv4/utils/datasets.py", line 337, in init cache = torch.load(cache_path) # load File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: STACK_GLOBAL requires str CPU times: user 105 ms, sys: 15.2 ms, total: 120 ms

Most likely you need to check ur path to --cfg while running train.py. It should be like --cfg models/yolov4-p5.yaml

ghost commented 3 years ago

I have checked everything but still facing same issue

WongKinYiu commented 3 years ago

please delete your .cache files and run again.

ghost commented 3 years ago

I have 20 labeled images, one sample is below, i have trained Scaled Yolo my validation Precision , Recall and mAP is 0, what should I do rd

WongKinYiu commented 3 years ago

train with your original image resolution and recalculate your anchors. if k-means shows that most anchors should be < 12, add P2 prediction layer.

ghost commented 3 years ago

How do i calculate anchors?

ghost commented 3 years ago

and Where to add P2 prediction layer?

WongKinYiu commented 3 years ago

you could use k-means to calculate anchors, and add P2 prediction layer before P3 prediction layer.

ghost commented 3 years ago

did this repo calculate k means?

ghost commented 3 years ago

Could you please share here I am new to Object detection

WongKinYiu commented 3 years ago

tutorial and examples: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3_5l.cfg

ghost commented 3 years ago

Okay means this repo does not have this features

ghost commented 3 years ago

Please shows how to change models files in your repo?

amkonshin commented 3 years ago

I`ve had the same issue: Using SyncBatchNorm() Traceback (most recent call last): File "train.py", line 443, in train(hyp, opt, device, tb_writer) File "train.py", line 151, in train world_size=opt.world_size) File "/models/ScaledYOLOv4/utils/datasets.py", line 60, in create_dataloader pad=pad) File "/models/ScaledYOLOv4/utils/datasets.py", line 337, in init cache = torch.load(cache_path) # load File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 580, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 750, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: STACK_GLOBAL requires str Traceback (most recent call last): File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in main() File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'train.py', '--local_rank=0', '--batch-size', '16', '--img', '896', '896', '--data', 'data/coco128.yaml', '--cfg', 'models/yolov4-p5.yaml', '--weights', '', '--sync-bn', '--device', '0,1', '--name', 'yolov4-p5']' returned non-zero exit status 1. Run train.py from fresh cloned repo on coco128. What can it be?

Hezhexi2002 commented 3 years ago

I meet this problem too when I try to train on my custom data though Pytorch_yolov4,so how canI I fix it?

skro123 commented 2 years ago

Maybe it’s the Cache under the dataset path, please try to delete the two files dataset/train/labels.cache and dataset/val/labels.cache

wyctorfogos commented 2 years ago

Maybe it’s the Cache under the dataset path, please try to delete the two files dataset/train/labels.cache and dataset/val/labels.cache

It worked for me using the Yolov7 project.

Mascobot commented 2 years ago

Deleting dataset/train/labels.cache and dataset/val/labels.cache also worked for me on Yolov7.

grwal commented 2 years ago

Deleting dataset/train/labels.cache and dataset/val/labels.cache also worked for me on Yolov7.

Is there 'dataset' path in original yolov7? I can't find dataset folder there

jpsaturnino commented 1 year ago

Maybe it’s the Cache under the dataset path, please try to delete the two files dataset/train/labels.cache and dataset/val/labels.cache

It works, thanks!

Creamd0 commented 1 year ago

nb