Hyunseok-Kim0 commented 1 year ago

Hello, I am trying to apply filter pruning to yolov8 model. I saw there is sample code for yolov7 in https://github.com/VainF/Torch-Pruning/blob/master/benchmarks/prunability/yolov7_train_pruned.py. Since yolov8 has very similar structure with yolov7, I thought it would be possible to pruning it with minimal modification. However, the pruning failed due to weird problem near Concat layer. I used code below under yolov8 root to prune the model.

import torch

from ultralytics import YOLO
import torch_pruning as tp

from ultralytics.nn.modules import Detect

def prune():
    # load trained yolov8x model
    model = YOLO('yolov8x.pt')

    for name, param in model.model.named_parameters():
        param.requires_grad = True

    # pruning
    model.model.eval()
    example_inputs = torch.randn(1, 3, 640, 640).to(model.device)
    imp = tp.importance.MagnitudeImportance(p=2)  # L2 norm pruning

    ignored_layers = []
    unwrapped_parameters = []

    modules_list = list(model.model.modules())
    for i, m in enumerate(modules_list):
        if isinstance(m, (Detect,)):
            ignored_layers.append(m)

    iterative_steps = 1  # progressive pruning
    pruner = tp.pruner.MagnitudePruner(
        model.model,
        example_inputs,
        importance=imp,
        iterative_steps=iterative_steps,
        ch_sparsity=0.5,  # remove 50% channels
        ignored_layers=ignored_layers,
        unwrapped_parameters=unwrapped_parameters
    )
    base_macs, base_nparams = tp.utils.count_ops_and_params(model.model, example_inputs)
    pruner.step()

    pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs)
    print(model.model)
    print("Before Pruning: MACs=%f G, #Params=%f G" % (base_macs / 1e9, base_nparams / 1e9))
    print("After Pruning: MACs=%f G, #Params=%f G" % (pruned_macs / 1e9, pruned_nparams / 1e9))

    # fine-tuning, TBD

if __name__ == "__main__":
    prune()

Following message is stack trace when pruning is failed.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_pruning/importance.py", line 88, in __call__
    w = layer.weight.data[idxs]
IndexError: index 640 is out of bounds for dimension 0 with size 640

the layer in error message is batchnorm layer which has (640,) shaped tensor in layer.weight.data. However, idxs has (1280,) shape and out of index values. In other layers around concat it also shows similar error, which means idxs has much larger shape or larger value than layer weight length. I tried to figure out why this problem happens, but stuck right now. I guess there is problem in graph construction like _ConcatIndexMapping or something for yolov8. It will be nice if you can help or give some advice to solve this problem.

VainF commented 1 year ago

Could you please try the tp.importance.RandomPruenr ? I'm not sure if this is caused by DepGraph or the importance module.

Hyunseok-Kim0 commented 1 year ago

Error occurred from C2f module in yolov8 when tp.importance.RandomImportance is used.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch_pruning/utils/op_counter.py", line 26, in count_ops_and_params
    _ = flops_model(example_inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/workspace/Projects/ultralytics/ultralytics/nn/tasks.py", line 203, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "/workspace/Projects/ultralytics/ultralytics/nn/tasks.py", line 58, in _forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Projects/ultralytics/ultralytics/nn/modules.py", line 193, in forward
    y.extend(m(y[-1]) for m in self.m)
  File "/workspace/Projects/ultralytics/ultralytics/nn/modules.py", line 193, in <genexpr>
    y.extend(m(y[-1]) for m in self.m)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Projects/ultralytics/ultralytics/nn/modules.py", line 131, in forward
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Projects/ultralytics/ultralytics/nn/modules.py", line 34, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [160, 320, 3, 3], expected input[1, 160, 32, 32] to have 320 channels, but got 160 channels instead

VainF commented 1 year ago

Thank you. I will try it!

Hyunseok-Kim0 commented 1 year ago

It was possible to executing pruner.step() using commit version 0d7a99b after I modified C2f module, with tp.importance.MagnitudeImportance. However, recent version did not work.

Here is the error message of most recent version (commit 69902e8)

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/Projects/Torch-Pruning/torch_pruning/importance.py", line 80, in __call__
    local_norm = local_norm[idxs]
IndexError: index 2880 is out of bounds for dimension 0 with size 1600

Here is the successful output (commit 0d7a99b + modified C2f module)

Click to open

``` DetectionModel( (model): Sequential( (0): Conv( (conv): Conv2d(3, 40, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(40, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (1): Conv( (conv): Conv2d(40, 80, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (2): C2f_v2( (cv0): Conv( (conv): Conv2d(80, 40, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(80, 40, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(40, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(40, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (3): Conv( (conv): Conv2d(80, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (4): C2f_v2( (cv0): Conv( (conv): Conv2d(160, 80, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(160, 80, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-5): 6 x Bottleneck( (cv1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (5): Conv( (conv): Conv2d(160, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (6): C2f_v2( (cv0): Conv( (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-5): 6 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (7): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (8): C2f_v2( (cv0): Conv( (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(800, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (9): SPPF( (cv1): Conv( (conv): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False) ) (10): Upsample(scale_factor=2.0, mode='nearest') (11): Concat() (12): C2f_v2( (cv0): Conv( (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(800, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (13): Upsample(scale_factor=2.0, mode='nearest') (14): Concat() (15): C2f_v2( (cv0): Conv( (conv): Conv2d(480, 80, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(480, 80, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(400, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (16): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (17): Concat() (18): C2f_v2( (cv0): Conv( (conv): Conv2d(480, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(480, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(800, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (19): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (20): Concat() (21): C2f_v2( (cv0): Conv( (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv1): Conv( (conv): Conv2d(640, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1) ) (cv2): Conv( (conv): Conv2d(800, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (m): ModuleList( (0-2): 3 x Bottleneck( (cv1): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (cv2): Conv( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(160, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) ) ) ) (22): Detect( (cv2): ModuleList( (0): Sequential( (0): Conv( (conv): Conv2d(160, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (2): Conv2d(80, 64, kernel_size=(1, 1), stride=(1, 1)) ) (1-2): 2 x Sequential( (0): Conv( (conv): Conv2d(320, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (1): Conv( (conv): Conv2d(80, 80, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(80, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (2): Conv2d(80, 64, kernel_size=(1, 1), stride=(1, 1)) ) ) (cv3): ModuleList( (0): Sequential( (0): Conv( (conv): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (2): Conv2d(320, 1, kernel_size=(1, 1), stride=(1, 1)) ) (1-2): 2 x Sequential( (0): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (1): Conv( (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(320, eps=0.001, momentum=0.03, affine=True, track_running_stats=True) (act): LeakyReLU(negative_slope=0.1, inplace=True) ) (2): Conv2d(320, 1, kernel_size=(1, 1), stride=(1, 1)) ) ) (dfl): DFL( (conv): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) ) ) Before Pruning: MACs=87.149033 G, #Params=0.068154 G After Pruning: MACs=29.102651 G, #Params=0.020711 G ```

It looks pruning working properly. The model map decreased 0.414 to 0.378 with ch_sparsity 0.01.

ducanhluu commented 1 year ago

@Hyunseok-Kim0 Hello, how did you modify the C2f module to make it work ? In that case, could you still be able to retrieve pretrained weights ?

@VainF I encountered the same problem with C2f module, it seems to me that it does not prune the Conv1 layer in the C2f module. Cf https://user-images.githubusercontent.com/27466624/222874205-3873bdac-7135-4ecc-8ab2-ca18b8e13fdf.jpg. When I grouped the convs of the Cf2 module together and exclude these groups from the pruning, It works.

Here is how I group these convs. With bottleneck_index is one of [2, 4, 6, 8, 12, 15, 18, 21]

def get_model_groups(model, bottleneck_index):
        return [
            [
                f"model.{i}.m.{n}.cv2.conv"
                for n in range(len(model.module[i].m if hasattr(model, "module") else model[i].m))
            ]
            + [f"model.{i}.cv1.conv"]
            for i in bottleneck_index
        ]

VainF commented 1 year ago

Thank you! Maybe I just introduced new bugs in the latest commit. Will fix it.

BTW, @Hyunseok-Kim0 could you please share your solution with other guys? I think it would be very helpful!

Hyunseok-Kim0 commented 1 year ago

Here is the modified C2f module. I found this in https://github.com/tianyic/only_train_once/issues/5.

class C2f_v2(nn.Module):
    # CSP Bottleneck with 2 convolutions
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv0 = Conv(c1, self.c, 1, 1)
        self.cv1 = Conv(c1, self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        # y = list(self.cv1(x).chunk(2, 1))
        y = [self.cv0(x), self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

@ducanhluu Here is the code for migrating pretrained C2f weight I used.

def infer_shortcut(bottleneck):
    c1 = bottleneck.cv1.conv.in_channels
    c2 = bottleneck.cv2.conv.out_channels
    return c1 == c2 and hasattr(bottleneck, 'add') and bottleneck.add

def transfer_weights(c2f, c2f_v2):
    c2f_v2.cv2 = c2f.cv2
    c2f_v2.m = c2f.m

    state_dict = c2f.state_dict()
    state_dict_v2 = c2f_v2.state_dict()

    # Transfer cv1 weights from C2f to cv0 and cv1 in C2f_v2
    old_weight = state_dict['cv1.conv.weight']
    half_channels = old_weight.shape[0] // 2
    state_dict_v2['cv0.conv.weight'] = old_weight[:half_channels]
    state_dict_v2['cv1.conv.weight'] = old_weight[half_channels:]

    # Transfer cv1 batchnorm weights and buffers from C2f to cv0 and cv1 in C2f_v2
    for bn_key in ['weight', 'bias', 'running_mean', 'running_var']:
        old_bn = state_dict[f'cv1.bn.{bn_key}']
        state_dict_v2[f'cv0.bn.{bn_key}'] = old_bn[:half_channels]
        state_dict_v2[f'cv1.bn.{bn_key}'] = old_bn[half_channels:]

    # Transfer remaining weights and buffers
    for key in state_dict:
        if not key.startswith('cv1.'):
            state_dict_v2[key] = state_dict[key]

    # Transfer all non-method attributes
    for attr_name in dir(c2f):
        attr_value = getattr(c2f, attr_name)
        if not callable(attr_value) and '_' not in attr_name:
            setattr(c2f_v2, attr_name, attr_value)

    c2f_v2.load_state_dict(state_dict_v2)

def replace_c2f_with_c2f_v2(module):
    for name, child_module in module.named_children():
        if isinstance(child_module, C2f):
            # Replace C2f with C2f_v2 while preserving its parameters
            shortcut = infer_shortcut(child_module.m[0])
            c2f_v2 = C2f_v2(child_module.cv1.conv.in_channels, child_module.cv2.conv.out_channels,
                            n=len(child_module.m), shortcut=shortcut,
                            g=child_module.m[0].cv2.conv.groups,
                            e=child_module.c / child_module.cv2.conv.out_channels)
            transfer_weights(child_module, c2f_v2)
            setattr(module, name, c2f_v2)
        else:
            replace_c2f_with_c2f_v2(child_module)

ducanhluu commented 1 year ago

@Hyunseok-Kim0 thank you for your sharing. With your patch, I can now prune the entire yolov8 as well.

There was a mistake in my previous message. The problem was actually at the split layer. By explicitly slit convs beforehand in the patch, it resolves the issue

VainF commented 1 year ago

Hey Guys! I think I found the bug. When a concat module & a split module are directly connected, the index mapping system fails to compute correct idxs. I'm going to rewrite the concat & split tracing. Really thanks for this issue!

ghost commented 1 year ago

The result is different when I use c2f_v2 module insead of c2f without pruning. I just run code as follows:

‘’‘ model = YOLO('runs/detect/train/weights/best.pt')

for name, param in model.model.named_parameters(): param.requires_grad = True

replace_c2f_with_c2f_v2(model.model)

success = model.export(format="onnx", imgsz=(192,640),simplify=True) ’‘’

onnx result is different from torch result. The precison is very low.

ghost commented 1 year ago

Need I re-train yolov8 model by using c2f_v2 instead of c2f ?

VainF commented 1 year ago

The result is different when I use c2f_v2 module insead of c2f without pruning. I just run code as follows:

‘’‘ model = YOLO('runs/detect/train/weights/best.pt')

for name, param in model.model.named_parameters(): param.requires_grad = True

replace_c2f_with_c2f_v2(model.model)

success = model.export(format="onnx", imgsz=(192,640),simplify=True) ’‘’

onnx result is different from torch result. The precison is very low.

There may be some issues with the weights copy. will check it and get back to you.

VainF commented 1 year ago

Hi @xiaofulee, I found that the official YOLOv8 relies on the following function to set up BN.eps and BN.momentum. By default, BN.eps=1e-5 which is incompatible with the official YOLO weights (BN.eps=0.001). Could you please try the latest commit?

def initialize_weights(model):
    # Initialize model weights to random values
    for m in model.modules():
        t = type(m)
        if t is nn.Conv2d:
            pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif t is nn.BatchNorm2d:
            m.eps = 1e-3
            m.momentum = 0.03
        elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
            m.inplace = True

The updated pipeline:

  for name, param in model.model.named_parameters():
        param.requires_grad = True

  replace_c2f_with_c2f_v2(model.model)
  initialize_weights(model.model) # set BN.eps, momentum, ReLU.inplace

ghost commented 1 year ago

I update the code but it make no sence.

The mAp of baseline is 79.1%.

When ch_sparsity is set to 0.1, mAp is just 26.2%. The Ap of class "face" is 0%.

Here is the test code:

test.txt

VainF commented 1 year ago

It requires post training.

Does the unpruned model work properly after module replacing?

ghost commented 1 year ago

It requires post training.

Does the unpruned model work properly after module replacing?

Yes. It works.

I will study how to post training. Thank you.

VainF commented 1 year ago

I'm not a yolo expert. But this line may be helpful for post-training:

pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs)
print(model.model)
print("Before Pruning: MACs=%f G, #Params=%f M" % (base_macs / 1e9, base_nparams / 1e6))
print("After Pruning: MACs=%f G, #Params=%f M" % (pruned_macs / 1e9, pruned_nparams / 1e6))

# post-training
model.train(data='coco128.yaml', epochs=100, imgsz=640)

Reference: https://docs.ultralytics.com/modes/train/

Please replace the coco128 toy set with a full coco dataset and use a smaller learning rate (original_lr x 0.1) for post-training.

ghost commented 1 year ago

I'm not a yolo expert. But this line may be helpful for post-training:
pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs)
print(model.model)
print("Before Pruning: MACs=%f G, #Params=%f M" % (base_macs / 1e9, base_nparams / 1e6))
print("After Pruning: MACs=%f G, #Params=%f M" % (pruned_macs / 1e9, pruned_nparams / 1e6))

# post-training
model.train(data='coco128.yaml', epochs=100, imgsz=640)
Reference: https://docs.ultralytics.com/modes/train/

Please replace the coco128 toy set with a full coco dataset and use a smaller learning rate (original_lr x 0.1) for post-training.

Thanks a lot. It‘s a greate work.

Hyunseok-Kim0 commented 1 year ago

Is there any progress about fixing bug? I still see error message with most recent version of code.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-bc2d9c71db3c> in <cell line: 2>()
      1 # output with current version
----> 2 prune()

4 frames
/usr/local/lib/python3.9/dist-packages/torch_pruning/dependency.py in _update_concat_index_mapping(self, cat_node)
    935         offsets = [0]
    936         for ch in chs:
--> 937             offsets.append(offsets[-1] + ch)
    938         cat_node.module.offsets = offsets
    939 

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Here is the gist for reproduction of bug of current version. https://colab.research.google.com/gist/Hyunseok-Kim0/92905bfa852f9c151c35c2ec9167595e/yolov8-pruning.ipynb

VainF commented 1 year ago

Hi @Hyunseok-Kim0, this error was fixed. No error with your example. Thank you!

Ghustwb commented 1 year ago

Hi, @VainF @Hyunseok-Kim0 @ducanhluu @xiaofulee Based on your information, I tried to prune yolov8-n. latest code: [d3562c0] I can found that MAC and PARAMS reduce, after Post training , mAP also back. But when I deploy pruned model , the inference time is same with original model. Is there others operation need to do ? Thanks

Hyunseok-Kim0 commented 1 year ago

@Ghustwb In my case the inference and training time are reduced together after pruning. Can you show your pruned model loading code and inference code?

Ghustwb commented 1 year ago

@Hyunseok-Kim0 ,I just run yolov8_pruning.py with modified yaml and size.
I tested 3 deploy ways, pytoch model,onnx model,TensorRT model All of them, the inference time between before-pruning with after-pruning are same. And just now, I found a strange thing, the model structure after beforepruning is actually the same as the model structure before pruning. Is there any bug in this file yolov8_pruning.py?

model.train(data='coco128.yaml', epochs=100, imgsz=640)

this operation in yolov8 will reconstruct all params based on yaml?

Hyunseok-Kim0 commented 1 year ago

@Ghustwb When you load pruned yolov8 model, try to load with pt file directly like model = YOLO('pruned.pt') without yaml file or update model in yaml file. You better go check how yolov8 model calls trainer and load model. I modified yolov8 source to make it not to load new model when pruned model is given. If you do not want to modify yolov8 source code, I think you have to save pruned model and load it again.

jhxiang commented 1 year ago

I am pruning YOLOv8 according to the tutorial, but I am encountering an issue which might be caused by replacing a module. How can I resolve this？

Ghustwb commented 1 year ago

@Ghustwb When you load pruned yolov8 model, try to load with pt file directly like model = YOLO('pruned.pt') without yaml file or update model in yaml file. You better go check how yolov8 model calls trainer and load model. I modified yolov8 source to make it not to load new model when pruned model is given. If you do not want to modify yolov8 source code, I think you have to save pruned model and load it again.

@Hyunseok-Kim0 Thanks for your information, I will try it again. And can you take a PR for yolov8_pruning.py?

Hyunseok-Kim0 commented 1 year ago

I see corresponding part is already changed in sample code. However, I think some parts in yolov8 also need to be changed together for better performance. I will try to make a PR tomorrow about those changes.

Ghustwb commented 1 year ago

Thank you @Hyunseok-Kim0 , I am waiting for your PR. BTW, can you share your performance result after yolo8 pruning? About mAP or inference time.

lucasjinreal commented 1 year ago

@Hyunseok-Kim0 Please share the AP comparasion before and after pruning. thanks!

lizhangxing commented 1 year ago

I encountered an error in the code during execution, stating that the tensors are not on the same device.

idiot499 commented 1 year ago

@Hyunseok-Kim0
Hi, I would like to ask how to inference about pruned YOLOv8 on the test dataset, has the structure changed? I wrote in the script main.py of YOLOv8 that：

model=YOLO("./runs/detect/step_15_finetune/weights/best.pt")
model.val(data='./ultralytics/datasets/custom.yaml')

But the error is reported as：

File "/opt/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/serialization.py", line 1039, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'C2f_v2' on <module '__main__' from 'main.py'>

Can you help me?

lizhangxing commented 1 year ago

I encountered an error in the code during execution, stating that the tensors are not on the same device. @Hyunseok-Kim0 can you explain it to me thank you

ajithkumarmcw commented 1 year ago

@Hyunseok-Kim0 Hi, I would like to ask how to inference about pruned YOLOv8 on the test dataset, has the structure changed? I wrote in the script main.py of YOLOv8 that：
model=YOLO("./runs/detect/step_15_finetune/weights/best.pt")
model.val(data='./ultralytics/datasets/custom.yaml')
But the error is reported as：
File "/opt/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/serialization.py", line 1039, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'C2f_v2' on <module '__main__' from 'main.py'>
Can you help me?

you can validate inside this code itself https://github.com/VainF/Torch-Pruning/blob/master/benchmarks/prunability/yolov8_pruning.py . Just remove all codes under def prune() and use below

def prune(args):
    # load trained yolov8 model
    model = YOLO(args.model)

    results = model.val(data='./ultralytics/datasets/custom.yaml')

apanand14 commented 1 year ago

I'm not a yolo expert. But this line may be helpful for post-training:
pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs)
print(model.model)
print("Before Pruning: MACs=%f G, #Params=%f M" % (base_macs / 1e9, base_nparams / 1e6))
print("After Pruning: MACs=%f G, #Params=%f M" % (pruned_macs / 1e9, pruned_nparams / 1e6))

# post-training
model.train(data='coco128.yaml', epochs=100, imgsz=640)
Reference: https://docs.ultralytics.com/modes/train/

Please replace the coco128 toy set with a full coco dataset and use a smaller learning rate (original_lr x 0.1) for post-training.

First of all thank you for the great work. I appreciate it. My question is I would like to prunned yolov8n-seg model. I followed the steps mentioned above in the issue. I started training. It's running without any error but my concern is that model size was around 6 MB (without pruning)but now the model size id around 32 MB. i don't know how. Is there any mistake am I doing or this method won't work for yolov8 nano segmentation models? Thank you in advance for your answer!!

VainF commented 1 year ago

Hi @apanand14. Pruning alters the model's structure, making the original definition in your .py file incompatible. To handle this, we save the entire model using torch.save(model, PATH) to a ".pth" file. This might increase the file size, but it doesn't affect the actual model size during training or inference. To check the real size, you can export the model to ONNX.

apanand14 commented 1 year ago

@VainF Thank you for your response. Yes. After exporting to ONNX the model size remains almost same. One more thing, I export to NCNN later. I hope that it will work without any modifications or should I consider it before exporting to NCNN? Thank you.

VainF commented 1 year ago

@VainF Thank you for your response. Yes. After exporting to ONNX the model size remains almost same. One more thing, I export to NCNN later. I hope that it will work without any modifications or should I consider it before exporting to NCNN? Thank you.

If the original model can be exported to ONNX and NCNN without any issues, the same pipeline should also be applicable to the pruned model.

apanand14 commented 1 year ago

Hi, I'm facing some error while pruning itself. I'm putting my code and error below: Please let me know if I'm doing any mistake then correct me. Thank you in advance.

from ultralytics import YOLO
import torch
import gc
import torch
import torch.nn as nn
from ultralytics.nn.modules import replace_c2f_with_c2f_v2, initialize_weights
import torch_pruning as tp

def run():
  model = YOLO('yolov8n-seg.pt')

  for name, param in model.model.named_parameters():
          param.requires_grad = True

      replace_c2f_with_c2f_v2(model.model)
      initialize_weights(model.model)  # set BN.eps, momentum, ReLU.inplace
      example_inputs = torch.randn(1, 3, 640, 640)
      imp = tp.importance.MagnitudeImportance(p=2)  # L2 norm pruning

      ignored_layers = []
      unwrapped_parameters = []
      iterative_steps = 1  # progressive pruning

      pruner = tp.pruner.MagnitudePruner(
          model.model,
          example_inputs,
          importance=imp,
          iterative_steps=iterative_steps,
          ch_sparsity=0.5,  # remove 50% channels
          ignored_layers=ignored_layers,
          unwrapped_parameters=unwrapped_parameters
      )
      pruner.step()

      base_macs, base_nparams = tp.utils.count_ops_and_params(model.model, example_inputs)
      pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs)

      print(model.model)
      print("Before Pruning: MACs=%f G, #Params=%f M" % (base_macs / 1e9, base_nparams / 1e6))
      print("After Pruning: MACs=%f G, #Params=%f M" % (pruned_macs / 1e9, pruned_nparams / 1e6))

      # post-training
      model.train(data='custom_data.yaml', epochs=100, batch=8, workers=4, optimizer='Adam', lr0=0.001)

if __name__ == '__main__':
    run()

Error:

Traceback (most recent call last): File "C:\yolov8\train.py", line 63, in run() File "C:\yolov8\train.py", line 42, in run base_macs, base_nparams = tp.utils.count_ops_and_params(model.model, example_inputs) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\torch_pruning\utils\op_counter.py", line 28, in count_ops_andparams = flops_model(example_inputs) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(*input, *kwargs) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 178, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 57, in _forward_once x = m(x) # run File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\modules.py", line 531, in forward x = self.detect(self, x) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\modules.py", line 499, in forward box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split((self.reg_max 4, self.nc), 1) File "C:\Users\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\modules.py", line 499, in box, cls = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2).split((self.reg_max 4, self.nc), 1) RuntimeError: shape '[1, 144, -1]' is invalid for input of size 716800

aicmaodyu commented 1 year ago

我在执行过程中遇到代码错误，指出张量不在同一设备上。@Hyunseok-Kim0你能给我解释一下吗，谢谢

That's the problem. How did you solve it?

ArthurRyan0803 commented 11 months ago

Hey Guys! I think I found the bug. When a concat module & a split module are directly connected, the index mapping system fails to compute correct idxs. I'm going to rewrite the concat & split tracing. Really thanks for this issue!

This bug still exists. https://github.com/VainF/Torch-Pruning/issues/219

Zzzxi commented 10 months ago

我在执行过程中遇到代码错误，指出张量不在同一设备上。@Hyunseok-Kim0你能给我解释一下吗，谢谢

这就是问题所在。你是怎么解决的？

Hello, have you solved it yet?

Prime-Rogue commented 10 months ago

This code is adapted from Issue #147, implemented by @Hyunseok-Kim0.

import argparse import math import os from copy import deepcopy from datetime import datetime from pathlib import Path from typing import List, Union

import numpy as np import torch import torch.nn as nn from matplotlib import pyplot as plt from ultralytics import YOLO, version from ultralytics.nn.modules import Detect, C2f, Conv, Bottleneck from ultralytics.nn.tasks import attempt_load_one_weight from ultralytics.engine.model import Model from ultralytics.engine.trainer import BaseTrainer from ultralytics.utils import yaml_load, LOGGER, RANK, DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS from ultralytics.utils.checks import check_yaml from ultralytics.utils.torch_utils import initialize_weights, de_parallel

import torch_pruning as tp

def save_pruning_performance_graph(x, y1, y2, y3): """ Draw performance change graph Parameters

x : List
    Parameter numbers of all pruning steps
y1 : List
    mAPs after fine-tuning of all pruning steps
y2 : List
    MACs of all pruning steps
y3 : List
    mAPs after pruning (not fine-tuned) of all pruning steps

Returns
-------

"""
try:
    plt.style.use("ggplot")
except:
    pass

x, y1, y2, y3 = np.array(x), np.array(y1), np.array(y2), np.array(y3)
y2_ratio = y2 / y2[0]

# create the figure and the axis object
fig, ax = plt.subplots(figsize=(8, 6))

# plot the pruned mAP and recovered mAP
ax.set_xlabel('Pruning Ratio')
ax.set_ylabel('mAP')
ax.plot(x, y1, label='recovered mAP')
ax.scatter(x, y1)
ax.plot(x, y3, color='tab:gray', label='pruned mAP')
ax.scatter(x, y3, color='tab:gray')

# create a second axis that shares the same x-axis
ax2 = ax.twinx()

# plot the second set of data
ax2.set_ylabel('MACs')
ax2.plot(x, y2_ratio, color='tab:orange', label='MACs')
ax2.scatter(x, y2_ratio, color='tab:orange')

# add a legend
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, loc='best')

ax.set_xlim(105, -5)
ax.set_ylim(0, max(y1) + 0.05)
ax2.set_ylim(0.05, 1.05)

# calculate the highest and lowest points for each set of data
max_y1_idx = np.argmax(y1)
min_y1_idx = np.argmin(y1)
max_y2_idx = np.argmax(y2)
min_y2_idx = np.argmin(y2)
max_y1 = y1[max_y1_idx]
min_y1 = y1[min_y1_idx]
max_y2 = y2_ratio[max_y2_idx]
min_y2 = y2_ratio[min_y2_idx]

# add text for the highest and lowest values near the points
ax.text(x[max_y1_idx], max_y1 - 0.05, f'max mAP = {max_y1:.2f}', fontsize=10)
ax.text(x[min_y1_idx], min_y1 + 0.02, f'min mAP = {min_y1:.2f}', fontsize=10)
ax2.text(x[max_y2_idx], max_y2 - 0.05, f'max MACs = {max_y2 * y2[0] / 1e9:.2f}G', fontsize=10)
ax2.text(x[min_y2_idx], min_y2 + 0.02, f'min MACs = {min_y2 * y2[0] / 1e9:.2f}G', fontsize=10)

plt.title('Comparison of mAP and MACs with Pruning Ratio')
plt.savefig('pruning_perf_change.png')

def infer_shortcut(bottleneck): c1 = bottleneck.cv1.conv.in_channels c2 = bottleneck.cv2.conv.out_channels return c1 == c2 and hasattr(bottleneck, 'add') and bottleneck.add

class C2f_v2(nn.Module):

CSP Bottleneck with 2 convolutions

def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
    super().__init__()
    self.c = int(c2 * e)  # hidden channels
    self.cv0 = Conv(c1, self.c, 1, 1)
    self.cv1 = Conv(c1, self.c, 1, 1)
    self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
    self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

def forward(self, x):
    # y = list(self.cv1(x).chunk(2, 1))
    y = [self.cv0(x), self.cv1(x)]
    y.extend(m(y[-1]) for m in self.m)
    return self.cv2(torch.cat(y, 1))

def transfer_weights(c2f, c2f_v2): c2f_v2.cv2 = c2f.cv2 c2f_v2.m = c2f.m

state_dict = c2f.state_dict()
state_dict_v2 = c2f_v2.state_dict()

# Transfer cv1 weights from C2f to cv0 and cv1 in C2f_v2
old_weight = state_dict['cv1.conv.weight']
half_channels = old_weight.shape[0] // 2
state_dict_v2['cv0.conv.weight'] = old_weight[:half_channels]
state_dict_v2['cv1.conv.weight'] = old_weight[half_channels:]

# Transfer cv1 batchnorm weights and buffers from C2f to cv0 and cv1 in C2f_v2
for bn_key in ['weight', 'bias', 'running_mean', 'running_var']:
    old_bn = state_dict[f'cv1.bn.{bn_key}']
    state_dict_v2[f'cv0.bn.{bn_key}'] = old_bn[:half_channels]
    state_dict_v2[f'cv1.bn.{bn_key}'] = old_bn[half_channels:]

# Transfer remaining weights and buffers
for key in state_dict:
    if not key.startswith('cv1.'):
        state_dict_v2[key] = state_dict[key]

# Transfer all non-method attributes
for attr_name in dir(c2f):
    attr_value = getattr(c2f, attr_name)
    if not callable(attr_value) and '_' not in attr_name:
        setattr(c2f_v2, attr_name, attr_value)

c2f_v2.load_state_dict(state_dict_v2)

def replace_c2f_with_c2f_v2(module): for name, child_module in module.named_children(): if isinstance(child_module, C2f):

Replace C2f with C2f_v2 while preserving its parameters

        shortcut = infer_shortcut(child_module.m[0])
        c2f_v2 = C2f_v2(child_module.cv1.conv.in_channels, child_module.cv2.conv.out_channels,
                        n=len(child_module.m), shortcut=shortcut,
                        g=child_module.m[0].cv2.conv.groups,
                        e=child_module.c / child_module.cv2.conv.out_channels)
        transfer_weights(child_module, c2f_v2)
        setattr(module, name, c2f_v2)
    else:
        replace_c2f_with_c2f_v2(child_module)

def save_model_v2(self: BaseTrainer): """ Disabled half precision saving. originated from ultralytics/yolo/engine/trainer.py """ ckpt = { 'epoch': self.epoch, 'best_fitness': self.best_fitness, 'model': deepcopy(de_parallel(self.model)), 'ema': deepcopy(self.ema.ema), 'updates': self.ema.updates, 'optimizer': self.optimizer.state_dict(), 'train_args': vars(self.args), # save as dict 'date': datetime.now().isoformat(), 'version': version}

# Save last, best and delete
torch.save(ckpt, self.last)
if self.best_fitness == self.fitness:
    torch.save(ckpt, self.best)
if (self.epoch > 0) and (self.save_period > 0) and (self.epoch % self.save_period == 0):
    torch.save(ckpt, self.wdir / f'epoch{self.epoch}.pt')
del ckpt

def final_eval_v2(self: BaseTrainer): """ originated from ultralytics/yolo/engine/trainer.py """ for f in self.last, self.best: if f.exists(): strip_optimizer_v2(f) # strip optimizers if f is self.best: LOGGER.info(f'\nValidating {f}...') self.metrics = self.validator(model=f) self.metrics.pop('fitness', None) self.run_callbacks('on_fit_epoch_end')

def strip_optimizer_v2(f: Union[str, Path] = 'best.pt', s: str = '') -> None: """ Disabled half precision saving. originated from ultralytics/yolo/utils/torch_utils.py """ x = torch.load(f, map_location=torch.device('cpu')) args = {DEFAULT_CFG_DICT, x['train_args']} # combine model args with default args, preferring model args if x.get('ema'): x['model'] = x['ema'] # replace model with ema for k in 'optimizer', 'ema', 'updates': # keys x[k] = None for p in x['model'].parameters(): p.requires_grad = False x['train_args'] = {k: v for k, v in args.items() if k in DEFAULT_CFG_KEYS} # strip non-default keys

x['model'].args = x['train_args']

torch.save(x, s or f)
mb = os.path.getsize(s or f) / 1E6  # filesize
LOGGER.info(f"Optimizer stripped from {f},{f' saved as {s},' if s else ''} {mb:.1f}MB")

def train_v2(self: YOLO, pruning=False, **kwargs): """ Disabled loading new model when pruning flag is set. originated from ultralytics/yolo/engine/model.py """

self._check_is_pytorch_model()
if self.session:  # Ultralytics HUB session
    if any(kwargs):
        LOGGER.warning('WARNING ⚠️ using HUB training arguments, ignoring local training arguments.')
    kwargs = self.session.train_args
overrides = self.overrides.copy()
overrides.update(kwargs)
if kwargs.get('cfg'):
    LOGGER.info(f"cfg file passed. Overriding default params with {kwargs['cfg']}.")
    overrides = yaml_load(check_yaml(kwargs['cfg']))
overrides['mode'] = 'train'
if not overrides.get('data'):
    raise AttributeError("Dataset required but missing, i.e. pass 'data=coco128.yaml'")
if overrides.get('resume'):
    overrides['resume'] = self.ckpt_path

self.task = overrides.get('task') or self.task
self.trainer = Model.task_map[self.task][1](overrides=overrides, _callbacks=self.callbacks)

if not pruning:
    if not overrides.get('resume'):  # manually set model only if not resuming
        self.trainer.model = self.trainer.get_model(weights=self.model if self.ckpt else None, cfg=self.model.yaml)
        self.model = self.trainer.model

else:
    # pruning mode
    self.trainer.pruning = True
    self.trainer.model = self.model

    # replace some functions to disable half precision saving
    self.trainer.save_model = save_model_v2.__get__(self.trainer)
    self.trainer.final_eval = final_eval_v2.__get__(self.trainer)

self.trainer.hub_session = self.session  # attach optional HUB session
self.trainer.train()
# Update model and cfg after training
if RANK in (-1, 0):
    self.model, _ = attempt_load_one_weight(str(self.trainer.best))
    self.overrides = self.model.args
    self.metrics = getattr(self.trainer.validator, 'metrics', None)

def prune(args):

load trained yolov8 model

model = YOLO(args.model)
model.__setattr__("train_v2", train_v2.__get__(model))
pruning_cfg = yaml_load(check_yaml(args.cfg))
batch_size = pruning_cfg['batch']

# use coco128 dataset for 10 epochs fine-tuning each pruning iteration step
# this part is only for sample code, number of epochs should be included in config file
# pruning_cfg['data'] = "coco128.yaml"
# pruning_cfg['epochs'] = 10

model.model.train()
replace_c2f_with_c2f_v2(model.model)
initialize_weights(model.model)  # set BN.eps, momentum, ReLU.inplace

for name, param in model.model.named_parameters():
    param.requires_grad = True

example_inputs = torch.randn(1, 3, pruning_cfg["imgsz"], pruning_cfg["imgsz"]).to(model.device)
macs_list, nparams_list, map_list, pruned_map_list = [], [], [], []
base_macs, base_nparams = tp.utils.count_ops_and_params(model.model, example_inputs)

# do validation before pruning model
pruning_cfg['name'] = f"baseline_val"
pruning_cfg['batch'] = 1
validation_model = deepcopy(model)
metric = validation_model.val(**pruning_cfg)
init_map = metric.box.map
macs_list.append(base_macs)
nparams_list.append(100)
map_list.append(init_map)
pruned_map_list.append(init_map)
print(f"Before Pruning: MACs={base_macs / 1e9: .5f} G, #Params={base_nparams / 1e6: .5f} M, mAP={init_map: .5f}")

# prune same ratio of filter based on initial size
pruning_ratio = 1 - math.pow((1 - args.target_prune_rate), 1 / args.iterative_steps)

for i in range(args.iterative_steps):

    model.model.train()
    for name, param in model.model.named_parameters():
        param.requires_grad = True

    ignored_layers = []
    unwrapped_parameters = []
    for m in model.model.modules():
        if isinstance(m, (Detect,)):
            ignored_layers.append(m)

    example_inputs = example_inputs.to(model.device)
    pruner = tp.pruner.GroupNormPruner(
        model.model,
        example_inputs,
        importance=tp.importance.GroupNormImportance(),  # L2 norm pruning,
        iterative_steps=1,
        pruning_ratio=pruning_ratio,
        ignored_layers=ignored_layers,
        unwrapped_parameters=unwrapped_parameters
    )

    # Test regularization
    # output = model.model(example_inputs)
    # (output[0].sum() + sum([o.sum() for o in output[1]])).backward()
    # pruner.regularize(model.model)

    pruner.step()
    # pre fine-tuning validation
    pruning_cfg['name'] = f"step_{i}_pre_val"
    pruning_cfg['batch'] = 1
    validation_model.model = deepcopy(model.model)
    metric = validation_model.val(**pruning_cfg)
    pruned_map = metric.box.map
    pruned_macs, pruned_nparams = tp.utils.count_ops_and_params(pruner.model, example_inputs.to(model.device))
    current_speed_up = float(macs_list[0]) / pruned_macs
    print(f"After pruning iter {i + 1}: MACs={pruned_macs / 1e9} G, #Params={pruned_nparams / 1e6} M, "
          f"mAP={pruned_map}, speed up={current_speed_up}")

    # fine-tuning
    for name, param in model.model.named_parameters():
        param.requires_grad = True
    pruning_cfg['name'] = f"step_{i}_finetune"
    pruning_cfg['batch'] = batch_size  # restore batch size
    model.train_v2(pruning=True, **pruning_cfg)

    # post fine-tuning validation
    pruning_cfg['name'] = f"step_{i}_post_val"
    pruning_cfg['batch'] = 1
    validation_model = YOLO(model.trainer.best)
    metric = validation_model.val(**pruning_cfg)
    current_map = metric.box.map
    print(f"After fine tuning mAP={current_map}")

    macs_list.append(pruned_macs)
    nparams_list.append(pruned_nparams / base_nparams * 100)
    pruned_map_list.append(pruned_map)
    map_list.append(current_map)

    # remove pruner after single iteration
    del pruner

    save_pruning_performance_graph(nparams_list, map_list, macs_list, pruned_map_list)

    if init_map - current_map > args.max_map_drop:
        print("Pruning early stop")
        break

model.export(format='onnx')

if name == "main": parser = argparse.ArgumentParser() parser.add_argument('--model', default='last.pt', help='Pretrained pruning target model file') parser.add_argument('--cfg', default='F:\PycharmProjects\yolov8\ultralytics\cfg\default.yaml', help='Pruning config file.' ' This file should have same format with ultralytics/yolo/cfg/default.yaml') parser.add_argument('--iterative-steps', default=16, type=int, help='Total pruning iteration step') parser.add_argument('--target-prune-rate', default=0.5, type=float, help='Target pruning rate') parser.add_argument('--max-map-drop', default=0.2, type=float, help='Allowed maximum map drop after fine-tuning')

args = parser.parse_args()

prune(args)

我使用了yolov8示例的剪枝，修改为自己的权重和模型结构，但是报错了 IndexError: index 768 is out of bounds for dimension 0 with size 384

请问这是什么原因呢？

yunlongwang-leopard commented 9 months ago

I use the offical code, but the pruning does not work. is it due to the coco128 dataset and only 10 epoch ?

Reaidu commented 9 months ago

before prune：image 1/1 D:\ObjectDection\Yolov8\ultralytics-8.0.132\bus.jpg: 640x480 (no detections), 17.9ms after prune：image 1/1 D:\ObjectDection\Yolov8\ultralytics-8.0.132\bus.jpg: 640x480 (no detections), 36.8ms May I ask why the reasoning time has increased by half? thanks! Supplementary:Using a custom dataset pruning rate of 0.5, compared to a time increase of 0.4ms @Ghustwb https://github.com/VainF/Torch-Pruning/issues/147#issuecomment-1521406684 @Ghustwb

minhhotboy9x commented 6 months ago

Hey Guys! I think I found the bug. When a concat module & a split module are directly connected, the index mapping system fails to compute correct idxs. I'm going to rewrite the concat & split tracing. Really thanks for this issue!

I tried to prune yolov8 with original c2f module, but it is still IndexError.

xiaoshuomin commented 4 months ago

@apanand14 Hi, I'm meeting the same mistake. How did you solve it?

ajithkumarmcw commented 4 months ago

the pruning script is old . it will work only till the version which was mentioned in readme. if some one can give the updated pruning script it would be great

chbw818 commented 3 months ago

hello，when i ran the code https://github.com/VainF/Torch-Pruning/blob/master/examples/yolov8/yolov8_pruning.py i met the error as follows: Traceback (most recent call last): File "d:/ultralytics-main/prune_v8.py", line 17, in <module> from ultralytics.engine.model import TASK_MAP ImportError: cannot import name 'TASK_MAP' from 'ultralytics.engine.model' (d:\ultralytics-main\ultralytics\engine\model.py) it seems caused by the version of the code of yolov8. but i have trained a model with the newest version of yolov8,I wonder how to solve it.

chbw818 commented 3 months ago

hello，when i ran the code https://github.com/VainF/Torch-Pruning/blob/master/examples/yolov8/yolov8_pruning.py i met the error as follows: Traceback (most recent call last): File "d:/ultralytics-main/prune_v8.py", line 17, in <module> from ultralytics.engine.model import TASK_MAP ImportError: cannot import name 'TASK_MAP' from 'ultralytics.engine.model' (d:\ultralytics-main\ultralytics\engine\model.py) it seems caused by the version of the code of yolov8. but i have trained a model with the newest version of yolov8,I wonder how to solve it.

The problem have been solved.And the same issue can be found in the issues. First,using this line to replace the line 250 in yolov8_pruning.py self.trainer = self.task_map[self.task]['trainer'](overrides=overrides, _callbacks=self.callbacks) next,fix the loss function in ultralytics/ultralytics/utils/loss.py like this: `def bbox_decode(self, anchor_points, pred_dist): """Decode predicted object bounding box coordinates from anchor points and distribution.""" if self.use_dfl: b, a, c = pred_dist.shape # batch, anchors, channels mydevice=torch.device('cuda:0') self.proj=self.proj.to(mydevice) pred_dist = pred_dist.view(b, a, 4, c // 4).softmax(3).matmul(self.proj.type(pred_dist.dtype))

        # pred_dist = pred_dist.view(b, a, c // 4, 4).transpose(2,3).softmax(3).matmul(self.proj.type(pred_dist.dtype))
        # pred_dist = (pred_dist.view(b, a, c // 4, 4).softmax(2) * self.proj.type(pred_dist.dtype).view(1, 1, -1, 1)).sum(2)
    return dist2bbox(pred_dist, anchor_points, xywh=False)`

ajithkumarmcw commented 3 months ago

@chbw818 could u post the entire updated code if possible it would be helpfull for many persons

VainF / Torch-Pruning

Pruning yolov8 failed #147

This code is adapted from Issue #147, implemented by @Hyunseok-Kim0.

def save_pruning_performance_graph(x, y1, y2, y3): """ Draw performance change graph Parameters

CSP Bottleneck with 2 convolutions

Replace C2f with C2f_v2 while preserving its parameters

x['model'].args = x['train_args']

load trained yolov8 model