WoosukKwon / retraining-free-pruning

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
https://arxiv.org/abs/2204.09656
173 stars 27 forks source link

How to use the final calculated mask to improve the speed of the model #19

Open yynngu opened 4 months ago

yynngu commented 4 months ago

May I ask how to increase the speed of the model after I calculate the required mask according to your method. Multiplying the mask directly onto the corresponding structure of the model doesn't seem to change much in inference speed?