boringresearch / bert_pruning

Investigate the bert pruning using different methods
0 stars 0 forks source link

Paper #1

Open xihajun opened 1 year ago

xihajun commented 1 year ago

https://github.com/WoosukKwon/retraining-free-pruning

SQuAD v2.0

Metric Constraint Seed MAC (%) Pruning Time (s) Test Accuracy (%) SQuAD v1.1
MAC 0.1 1 10.00 647.13 3.0021 ---
MAC 0.2 - 20.00 - 2.6137 ---
MAC 0.3 - 30.00 - 29.6993 ---
MAC 0.4 - 40.00 - 55.4981 70.29
MAC 0.5 1 50.00 573.83 78.6466 89.29
MAC 0.6 - 60.00 - 82.3729 90.97
MAC 0.7 1 70.00 567.95 83.5277 91.94
MAC 0.8 1 80.00 559.19 84.7552 92.59
MAC 0.9 1 90.00 440.01 92.9893 93.00

SQuAD

Constraint Pruned Model MAC Pruning Time (s) Test Accuracy SQuAD v2.0
0.1 10.00% 155.97 3.0021 ---
0.2 10.00% 155.69 3.0021 ---
0.3 20.00% 168.12 3.0938 ---
0.4 40.00% 193.25 68.9950 ---
0.5 50.00% 209.05 89.4326 ---
0.6 60.00% 426.46 91.2639 ---
0.7 70.00% 417.69 92.1652 81.76
0.8 80.00% 435.84 92.5481 84.11
0.9 90.00% 440.01 92.9893 85.97
xihajun commented 1 year ago

Test Accuracy

https://github.com/WoosukKwon/retraining-free-pruning/blob/806ac5a6ff53b4978e5330d45a3de692493e4d0b/evaluate/nlp.py#L10

https://github.com/WoosukKwon/retraining-free-pruning/blob/806ac5a6ff53b4978e5330d45a3de692493e4d0b/main.py#L218-L223

xihajun commented 1 year ago

Report on NN Pruning and Retraining-Free Method for QA Model

In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.

NN Pruning

We first applied NN pruning to the BERT-large model finetuned on SQuAD dataset, using the Hugging Face library. We also evaluated the original model downloaded from the mlperf website for comparison. The results are summarized in the following table:

Model Sparsity EM F1
bert-large finetuned from huggingface N/A 78.88% 86.84%
original model from mlperf N/A 75.83% 84.23%
finetuned model from mlperf (59.9% sparsity) 59.9% 69.22% 78.70% [^1]

[^1]: about 6% drop

We observed that even the original model downloaded from the mlperf website did not perform as well as the model finetuned using the Hugging Face library on the SQuAD dataset. However, we also observed that increasing the sparsity of the model resulted in a drop in performance. For instance, when we pruned the model to 59.9% sparsity, we saw a drop of about 6% in the F1 score compared to the original model. We suspect that the difference in performance could be due to differences in the way we processed the SQuAD data or differences in the implementation of the model.

Retraining-Free Method

Applied the retraining-free method to the original BERT-large model downloaded from the mlperf website

Constraint Value
10% 6.94
20% 8.09
30% 46.95
40% 80.81
50% 88.57
60% 89.72
70% 90.49
80% 90.74
90% 90.85

We observed that the retraining-free method resulted in reasonable compression of the model, with sparsity up to 90% achieving an F1 score above 90%. This suggests that the retraining-free method could be a useful approach for compressing large language models like BERT, without sacrificing performance.

Overall, our experiments demonstrate that NN pruning and retraining-free methods can be effective for compressing and optimizing QA models. However, the performance of the compressed models depends on several factors, such as the sparsity level, the implementation of the compression method, and the quality of the training data.

xihajun commented 1 year ago
Constraint SQuAD Test Accuracy SQuAD v2 Test Accuracy
100.00% ---- --------------
95.00% ---- 45.34
90.00% 90.8533 --------------
80.00% 90.7409 --------------
75.00% 90.5938 --------------
70.00% 90.4981 45.16
65.00% 90.2348 45.08
60.00% 89.7231 -------------
50.00% 88.5696 -------------
40.00% 80.8123 -------------
xihajun commented 1 year ago

Report on NN Pruning and Retraining-Free Method for QA Model

In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.

NN Pruning (SQuAD) - BERT Large (MLPerf)

Model Sparsity Exact match F1
bert-large finetuned for squad from huggingface N/A 78.88% 86.84%
original model from mlperf N/A 75.83% 84.23%
finetuned model from mlperf (59.9% sparsity) 59.9% 69.22% 78.70% [^1]

[^1]: about 6% drop