Open yumath opened 1 year ago
Hi @yumath , thanks for your interest in our work!
What are your training settings (batch size, patch size, #epochs, learning rate) of both the regularized training and finetuning stages?
We using training settings same as your setting: https://github.com/MingSun-Tse/ASSL#run in README.md
Thanks for your reply. Yes, I have done 5000epoch finetune after pruning, and hyper-params below:
--model LEDSR --scale 2 --patch_size 96 \
--ext sep --data_url data/ASSL/ --data_train DF2K --data_test Set5 --data_range 1-3550 \
--chop --save_results --n_resblocks 16 --n_feats 256 \
--method ASSL --wn --stage_pr [0-1000:0.80859375] --skip_layers *mean*,*tail* \
--same_pruned_wg_layers model.head.0,model.body.16,*body.2 --reg_upper_limit 0.5 \
--reg_granularity_prune 0.0001 --update_reg_interval 20 --stabilize_reg_interval 43150 \
--pre_train ckpt/LEDSRx2_B16C256_8u128bs.pt --same_pruned_wg_criterion reg \
--save ASSL_pruning/LEDSR_F256R16BIX2_DF2K_ASSL0.80859375_RGP0.0001_RUL0.5_Pretrain
Hi @yumath , Okay, I see. The problem may be that, this script is not for the "sufficient finetuning" I meant. Although this script has a part of finetuning, as you may note, it is very short and the batch size & patch size are small. For the best performance, a heavier finetuning (according to the works of network pruning in classification) is recommended -- so after you have the pruned weights (using the scripts in README), try to use the following scripts to do the heavier finetuning:
2X: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --model LEDSR --n_resblocks 16 --n_feats 256 --scale 2 --patch_size 128 --batch_size 256 --ext bin --lr 8e-4 --chop --save_results --pre_train Experiments/
3X: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --model LEDSR --n_resblocks 16 --n_feats 256 --scale 3 --patch_size 192 --batch_size 256 --ext bin --lr 8e-4 --chop --save_results --pre_train Experiments/
4X: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --model LEDSR --n_resblocks 16 --n_feats 256 --scale 4 --patch_size 256 --batch_size 256 --ext bin --lr 8e-4 --chop --save_results --pre_train Experiments/
Let me know if you have more questions. Tx!
@MingSun-Tse Thank you very much, I'll try it as soon as possible!
Hi @MingSun-Tse , thanks for your reply, I have tried heavier finetuning as your suggestion, and reproduced same results as in your paper. But I wonder, what is the meaning of network pruning? I can just reproduce the SOTA result in your paper by heavier finetune a checkpoint trained from scratch. | PSNR x2 | Set5 | Set14 | B100 | Urban100 | Manga109 |
---|---|---|---|---|---|---|
reported in your GASSL TPAMI 2023 w/o --self-ensemble | 38.08 | 33.75 | 32.24 | 32.29 | 38.92 | |
Heavier fine-tune a B16C49 which trained from scratch | 38.07 | 33.71 | 32.24 | 32.26 | 38.95 |
Hi @MingSun-Tse , thanks for your reply, I have tried heavier finetuning as your suggestion, and reproduced same results as in your paper. But I wonder, what is the meaning of network pruning? I can just reproduce the SOTA result in your paper by heavier finetune a checkpoint trained from scratch.
PSNR x2 Set5 Set14 B100 Urban100 Manga109 reported in your GASSL TPAMI 2023 w/o --self-ensemble 38.08 33.75 32.24 32.29 38.92 Heavier fine-tune a B16C49 which trained from scratch 38.07 33.71 32.24 32.26 38.95
Hi @yumath , thanks for the further feedback!
One potential problem with the comparison you presented is, GASSL uses non-uniform layerwise pruning ratio (i.e., its #channels is not C49), which actually has lower FLOPs/Params than the C49 model (see Tab. 3 in the TPAMI paper). So the comparison you presented may not be fair.
The scale x2 is quite small (too easy), so different methods show quite close performance. It might be better to also have x3 and x4 results.
Regarding the meaning of filter pruning, there is an ongoing discussion in the pruning community. I guess what you find is similar to the argument in this work.
Hi, @MingSun-Tse @yulunzhang ,we are all interesting in your this work, but I also meet reproduce problems. We using this code and training settings, but can't reproduce same results in paper or released checkpoint. same questions about implementation details below: https://github.com/MingSun-Tse/ASSL/issues/3#issue-1261458753 and https://github.com/MingSun-Tse/ASSL/issues/3#issuecomment-1202097388 and https://github.com/MingSun-Tse/ASSL/issues/3#issuecomment-1503240059
So, does something wrong in default training settings? Please disclose more training details about default ASSL like https://github.com/MingSun-Tse/ASSL/issues/3#issuecomment-1148625155, thanks very much!