VainF / Torch-Pruning

[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
https://arxiv.org/abs/2301.12900
MIT License
2.44k stars 308 forks source link

Can't reproduce the benchmark results #394

Closed Alejandro-Casanova closed 1 week ago

Alejandro-Casanova commented 1 week ago

Just downloaded the pre-trained models as indicated and run:

python main.py --mode prune --model vgg19 --batch-size 128 --restore models/cifar100_vgg19.pth --dataset cifar100  --method group_norm --speed-up 8.84 --global-pruning 

(also as indicated)

the console output is:

[06/28 14:31:17 cifar100-global-group_norm-vgg19]: mode: prune
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: model: vgg19
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: verbose: False
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: dataset: cifar100
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: dataroot: data
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: batch_size: 128
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: total_epochs: 100
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: lr_decay_milestones: 60,80
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: lr_decay_gamma: 0.1
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: lr: 0.01
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: restore: models/cifar100_vgg19.pth
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: output_dir: run/cifar100/prune/cifar100-global-group_norm-vgg19
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: finetune: False
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: method: group_norm
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: speed_up: 8.84
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: max_pruning_ratio: 1.0
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: soft_keeping_ratio: 0.0
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: reg: 0.0005
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: delta_reg: 0.0001
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: weight_decay: 0.0005
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: seed: None
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: global_pruning: True
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: sl_total_epochs: 100
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: sl_lr: 0.01
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: sl_lr_decay_milestones: 60,80
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: sl_reg_warmup: 0
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: sl_restore: None
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: iterative_steps: 400
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: logger: <Logger cifar100-global-group_norm-vgg19 (DEBUG)>
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: device: cuda
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: num_classes: 100
[06/28 14:31:17 cifar100-global-group_norm-vgg19]: Loading model from models/cifar100_vgg19.pth
[06/28 14:31:18 cifar100-global-group_norm-vgg19]: Pruning...
[06/28 14:31:27 cifar100-global-group_norm-vgg19]: VGG(
  (block0): Sequential(
    (0): Conv2d(3, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(2, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (block1): Sequential(
    (0): Conv2d(1, 15, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(15, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(15, 31, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(31, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (block2): Sequential(
    (0): Conv2d(31, 77, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(77, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(77, 133, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(133, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Conv2d(133, 183, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(183, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): Conv2d(183, 229, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): BatchNorm2d(229, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (block3): Sequential(
    (0): Conv2d(229, 202, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(202, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(202, 131, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(131, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Conv2d(131, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): Conv2d(4, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): BatchNorm2d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (block4): Sequential(
    (0): Conv2d(3, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(1, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Conv2d(1, 13, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(13, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): Conv2d(13, 36, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (10): BatchNorm2d(36, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (pool0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (pool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (pool4): AdaptiveAvgPool2d(output_size=(1, 1))
  (classifier): Linear(in_features=36, out_features=100, bias=True)
)
[06/28 14:31:27 cifar100-global-group_norm-vgg19]: Params: 20.09 M => 1.38 M (6.89%)
[06/28 14:31:27 cifar100-global-group_norm-vgg19]: FLOPs: 512.73 M => 57.48 M (11.21%, 8.92X )
[06/28 14:31:27 cifar100-global-group_norm-vgg19]: Acc: 0.7350 => 0.0100
[06/28 14:31:27 cifar100-global-group_norm-vgg19]: Val Loss: 1.2668 => 4.6495

In the last lines it's shown that accuracy dropped from 73.5% to 1%! Whereas the benchmark result state a final accuracy of 70.39% for the pruned model.

What did I do wrong...?

Alejandro-Casanova commented 1 week ago

never mind, just had to fine-tune afterwards