dnth / yolov5-deepsparse-blogpost

By the end of this post, you will learn how to: Train a SOTA YOLOv5 model on your own data. Sparsify the model using SparseML quantization aware training, sparse transfer learning, and one-shot quantization. Export the sparsified model and run it using the DeepSparse engine at insane speeds. P/S: The end result - YOLOv5 on CPU at 180+ FPS using on
https://dicksonneoh.com/portfolio/supercharging_yolov5_180_fps_cpu/
53 stars 13 forks source link

Error with training and export #2

Closed santoshmedisetty closed 2 years ago

santoshmedisetty commented 2 years ago

Hi, I trained a yolov5-nano model with pruned and quantized recipe on my custom data. As soon as the last epoch is completed, I get the below error. Is this something to do with any package installation? I did not get any error with 'yolov5.transfer_learn_pruned_quantized.md' recipe

I'm using Pytorch 1.9.0

Below is my training command. python3 train.py --cfg ./models_v5.0/yolov5n.yaml --data ../aris_and_video_data3/data.yaml --hyp data/hyps/hyp.scratch.yaml --weights yolov5n.pt --img 640 --batch-size 16 --optimizer SGD --recipe ../recipes/yolov5.transfer_learn_pruned_quantized.md --project yolov5-deepsparse --name yolov5n-sgd-pruned-quantized3 --device 0

Below is the error message after the last epoch.

Traceback (most recent call last): File "train.py", line 745, in main(opt) File "train.py", line 641, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 514, in train model=loadcheckpoint(type='ensemble', weights=best, device=device)[0], File "/home/santosh/deepsparse_fishcount/fish-video-count-pipeline-PROD/yolov5_deepsparse_blogpost/yolov5_train/export.py", line 529, in load_checkpoint state_dict = load_state_dict(model, state_dict, run_mode=not ensemble_type, exclude_anchors=exclude_anchors) File "/home/santosh/deepsparse_fishcount/fish-video-count-pipeline-PROD/yolov5_deepsparse_blogpost/yolov5_train/export.py", line 553, in load_state_dict model.load_state_dict(state_dict, strict=not run_mode) # load File "/home/santosh/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict: "model.0.conv.quant.activation_post_process.scale", "model.0.conv.quant.activation_post_process.zero_point", "model.0.conv.quant.activation_post_process.fake_quant_enabled", "model.0.conv.quant.activation_post_process.observer_enabled", "model.0.conv.quant.activation_post_process.scale", "model.0.conv.quant.activation_post_process.zero_point", "model.0.conv.quant.activation_post_process.activation_post_process.min_val", "model.0.conv.quant.activation_post_process.activation_post_process.max_val", "model.0.conv.module.weight", "model.0.conv.module.bias", "model.0.conv.module.weight_fake_quant.scale", "model.0.conv.module.weight_fake_quant.zero_point", "model.0.conv.module.weight_fake_quant.fake_quant_enabled", "model.0.conv.module.weight_fake_quant.observer_enabled", "model.0.conv.module.weight_fake_quant.scale", "model.0.conv.module.weight_fake_quant.zero_point", "model.0.conv.module.weight_fake_quant.activation_post_process.min_val", "model.0.conv.module.weight_fake_quant.activation_post_process.max_val", "model.0.conv.module.activation_post_process.scale", "model.0.conv.module.activation_post_process.zero_point", "model.0.conv.module.activation_post_process.fake_quant_enabled", .....

I was able to paste only a portion of the error.

dnth commented 2 years ago

Hi @santoshmedisetty I wasnt able to reproduce the error above. Are you training using the YOLOv5 repo from Ultralytics?

dnth commented 2 years ago

I've added commands to train a YOLOv5-Nano in my Colab notebook. Check it out to see if it works for your dataset.

https://colab.research.google.com/github/dnth/yolov5-deepsparse-blogpost/blob/master/notebooks/deepsparse_blogpost.ipynb

santoshmedisetty commented 2 years ago

Hi @dnth, No, I'm using the Yolov5 repo from your repository. I got the error with 'recipes/yolov5.transfer_learn_pruned_quantized.md' recipe. When I changed the recipe to 'recipes/yolov5.transfer_learn_pruned.md', I did not get any error.

My training command almost looks like yours. Did I miss anything?

dnth commented 2 years ago

That's strange. Have you tried with the Colab notebook? Does it give the same error?

santoshmedisetty commented 2 years ago

I did not get error with the Colab notebook. This might be an error due to some package. I'll check

dnth commented 2 years ago

I did not get error with the Colab notebook. This might be an error due to some package. I'll check

Keep me updated here :)

santoshmedisetty commented 2 years ago

Hi @dnth, There seemed to be some issue with some packages. When I reinstalled all the requirements, it worked fine. Thank you