VainF / Torch-Pruning

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
https://arxiv.org/abs/2301.12900
MIT License
2.59k stars 318 forks source link

Yolov7 pruned model does not detect anything? #186

Open aidevmin opened 1 year ago

aidevmin commented 1 year ago

@VainF Thanks for amazing repo. I tried to run inference for yolov7 pruned model on one image, but pruned model did not detect anything (output image without bboxes) https://github.com/VainF/Torch-Pruning/blob/master/benchmarks/prunability/readme.md#3-yolo-v7

python yolov7_detect_pruned.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg

I saw that in the file yolov7_detect_pruned.py you already set ignored_layers

    ################################################################################
    # Pruning
    example_inputs = torch.randn(1, 3, 224, 224).to(device)
    imp = tp.importance.MagnitudeImportance(p=2) # L2 norm pruning

    ignored_layers = []
    from models.yolo import Detect
    for m in model.modules():
        if isinstance(m, Detect):
            ignored_layers.append(m)
    print(ignored_layers)

    iterative_steps = 1 # progressive pruning
    pruner = tp.pruner.MagnitudePruner(
        model,
        example_inputs,
        importance=imp,
        iterative_steps=iterative_steps,
        ch_sparsity=0.5, # remove 50% channels, ResNet18 = {64, 128, 256, 512} => ResNet18_Half = {32, 64, 128, 256}
        ignored_layers=ignored_layers,
    )
    base_macs, base_nparams = tp.utils.count_ops_and_params(model, example_inputs)

    pruner.step()
    pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(model, example_inputs)
    print(model)
    print("Before Pruning: MACs=%f G, #Params=%f G"%(base_macs/1e9, base_nparams/1e9))
    print("After Pruning: MACs=%f G, #Params=%f G"%(pruned_macs/1e9, pruned_nparams/1e9))
    ####################################################################################

But I saw in the log file for Detect module before pruning

(105): Detect(
      (m): ModuleList(
        (0): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 255, kernel_size=(1, 1), stride=(1, 1))
      )
    )

and after pruning

(105): Detect(
      (m): ModuleList(
        (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
      )
    )

Could you @VainF check it again? Thanks

VainF commented 1 year ago

Hello @aidevmin. It requires post-training. Please finetune the pruned model on COCO for a few epochs with a small learning rate.

aidevmin commented 1 year ago

Thank @VainF for quick response. It means that I need to run file yolov7_train_pruned.py. Is it right? I saw file yolov7_train_pruned.py file too, but input of this file is yolov7_training.pt, not yolov7.pt.

I will try running file yolov7_train_pruned.py

VainF commented 1 year ago
  1. mAP: The performance of pruned yolov7 has not been checked.
  2. Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.
aidevmin commented 1 year ago

Thanks a lot. I will try it.

aidevmin commented 1 year ago
  1. mAP: The performance of pruned yolov7 has not been checked.
  2. Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.

I recognized that obtained model after running pruning and finetuning from yolov7_train_pruned.py can not paramaterized.

aidevmin commented 1 year ago
  1. mAP: The performance of pruned yolov7 has not been checked.
  2. Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one.

@VainF thanks, it works. But it seem that tp.state_dict save whole weights of model and architecture, because pruned model size is larger than the original .pt model (only weights). Is there any weights to save only weights and still ensure that model is loaded with pruned model architecture properly?

AymenBOUGUERRA commented 1 year ago

recognized that obtained model after running pruning and finetuning from yolov7_train_pruned.py can not paramaterized.

@aidevmin ,I dont think that parametrization is needed as exporting to onnx will apply all of the necessary optimisation on the model as well as exporting the correct model (ema if applicable).

I can think of two simple ways to shortcut this problem:

mAP: The performance of pruned yolov7 has not been checked.

@VainF I have retrained yolov7 pruned on COCO for 300 epochs and there is a noticeable degradation of about 5-6 points across all metrics which is amazing considering that we eliminated 75% of the network. (I am currently trying to apply knowledge distillation so that the pruned model (student) can learn from the baseline model(teacher) during its retraining in your sricpt as well as ignoring some of the most sensible layers from being pruned).

Note: the pruning must be done on 0.5 sparsity to have a good acceleration on TensorRT, other values will actually hinder the engine's speed.

Save & Load: Please try tp.state_dict & tp.load_state_dict. This allows us to save the attributes like conv.in_channels into a .pth and re-load the pruned model using an unpruned one. @VainF thanks, it works. But it seem that tp.state_dict save whole weights of model and architecture, because pruned model size is larger than the original .pt model (only weights). Is there any weights to save only weights and still ensure that model is loaded with pruned model architecture properly?

@aidevmin The solution that I implemented for saving while being able to reload the model is to create a yaml configuration file for the pruned model at the moment of the pruning, loading the pruned model with this configuration will work perfectly. There are also EMA issues that need to be addressed.

I may submit a PR with this adjustments in the following weeks if I find the time. Don't hesitate to ask for clarifications.

aidevmin commented 1 year ago

@AymenBOUGUERRA Thanks for detail respone.

I dont think that parametrization is needed as exporting to onnx will apply all of the necessary optimisation on the model as well as exporting the correct model (ema if applicable).

Yes, I agree with you. I can successfully export .pt to onnx without repameterization

One interesting thing I found that after removing 1% channels, speed of pruned model (TRT model) is larger than the one before pruning. It is suprised. I will investigate more and inform to you.

AymenBOUGUERRA commented 1 year ago

@aidevmin Hello again,

Yes, I agree with you. I can successfully export .pt to onnx without repameterization

Be careful to always load the model using the provided function in the repo such as attempt_load(), the reason for this is that un-repameterized models have 2 types of weights in the checkpoints "model" and "EMA", EMA are about 2 to 5 points better than the default weights in terms of mAP and other metrics, the provided function will try to load EMA first and then model of ema is not present.

One interesting thing I found that after removing 1% channels, speed of pruned model (TRT model) is larger than the one before pruning. It is suprised. I will investigate more and inform to you.

I have already encountered this issues, here are the results of my investigation: image

furthermore, it seems that the pruning ratio must be (1-(1/2^n)) with 0<n<5 in order to have a speedup in TensorRT, and using such aggressive pruning ratios will require you to not only finetune the model but rather retrain it from scratch as the feature maps are utterly destroyed.

Don't hesitate for any question or clarification.

aidevmin commented 1 year ago

@AymenBOUGUERRA Thank you so much for information.

furthermore, it seems that the pruning ratio must be (1-(1/2^n)) with 0<n<5 in order to have a speedup in TensorRT, and using such aggressive pruning ratios will require you to not only finetune the model but rather retrain it from scratch as the feature maps are utterly destroyed.

May be we need to find other way such as KD to keep resonable acc. I dont know why some pruned models are slower than the original model.

AymenBOUGUERRA commented 1 year ago

@aidevmin

May be we need to find other way such as KD to keep reasonable acc. I don't know why some pruned models are slower than the original model.

I have been trying to use KD to reduce the overall degradation of hte pruning, including loss distillation, and feature distillation. and even thought the models converge faster, they are unable to surpasse the pruned models without KD in terms of accuracies, even thought the KD works (tried teaching the model without annotation or ground truth and only by imitation and it converges); this, in my opinion can only mean that the pruned models are simply too small to reach the accuracies to their full size counterparts. I also tried this implementation https://github.com/wonbeomjang/yolov5-knowledge-distillation

I tried also ignoring layers from being pruned, this is an alternative to changing the channel sparsity of 0.5, so instead of using 0.1 sparsity, you can use 0.5 and exclude x amount of layers to reduce the loss ans the number of parameters pruned. here are some of my tests

image

image

So, excluding 15 layers from being pruned will reduce the mAP loss from 6 points to 4 points and still give us a considerable speed up.

I haven't tried excluding more, but I am pretty sure that the graph pruning loss in function of excluded layers from pruning will follow a logarithmic curve starting at -6 and eventually reaching 0

aidevmin commented 10 months ago

@AymenBOUGUERRA

The solution that I implemented for saving while being able to reload the model is to create a yaml configuration file for the pruned model at the moment of the pruning, loading the pruned model with this configuration will work perfectly.

Could you share source to create a yaml configuration file for the pruned model at the moment of the pruning? Thanks a lot.

AymenBOUGUERRA commented 10 months ago

train_yolov7_prune.txt

@aidevmin Create a .py file in the same place as the standard yolov7 train script and copy the whole code.

You can use --ignored layers "11 28" to ignore layers 11 and 28 so they don't get pruned

The argument --original_configuration will load the original yaml file that is used in this configuration in order to calculate and creat the new pruned yaml configuration (it will load by defaut the yolov7 config but it should be changed if you are working with yolov7x for exemple).

When you used this script to prune and train a model, the pruned model yaml file will be generated in the same folder where you training is saved, typically runs/train/1/pruned.yaml

you can then use --cfg_pruned "runs/train/1/pruned.yaml" to reload the pruned model with its normal weights and ema weights and continue your training.

if you wish to load the model for inference, you do not need this script, because the attempt_load() function the repo does handle the pruned model correctly.

AFallDay commented 9 months ago

hello @VainF , I'm a beginner and I'm also having problems detecting anything after pruning with yolov5s, is there a fine tuning code for yolov5? please

VainF commented 9 months ago

Hi @AFallDay, I'm sorry there is no finetuning code for yolov5 in this project. You can try the official training code from yolov5.