Paper - Githubissues

xihajun commented 1 year ago

https://github.com/WoosukKwon/retraining-free-pruning

SQuAD v2.0

Metric	Constraint	Seed	MAC (%)	Pruning Time (s)	Test Accuracy (%)	SQuAD v1.1
MAC	0.1	1	10.00	647.13	3.0021	---
MAC	0.2	-	20.00	-	2.6137	---
MAC	0.3	-	30.00	-	29.6993	---
MAC	0.4	-	40.00	-	55.4981	70.29
MAC	0.5	1	50.00	573.83	78.6466	89.29
MAC	0.6	-	60.00	-	82.3729	90.97
MAC	0.7	1	70.00	567.95	83.5277	91.94
MAC	0.8	1	80.00	559.19	84.7552	92.59
MAC	0.9	1	90.00	440.01	92.9893	93.00

SQuAD

Constraint	Pruned Model MAC	Pruning Time (s)	Test Accuracy	SQuAD v2.0
0.1	10.00%	155.97	3.0021	---
0.2	10.00%	155.69	3.0021	---
0.3	20.00%	168.12	3.0938	---
0.4	40.00%	193.25	68.9950	---
0.5	50.00%	209.05	89.4326	---
0.6	60.00%	426.46	91.2639	---
0.7	70.00%	417.69	92.1652	81.76
0.8	80.00%	435.84	92.5481	84.11
0.9	90.00%	440.01	92.9893	85.97

xihajun commented 1 year ago

Test Accuracy

https://github.com/WoosukKwon/retraining-free-pruning/blob/806ac5a6ff53b4978e5330d45a3de692493e4d0b/evaluate/nlp.py#L10

https://github.com/WoosukKwon/retraining-free-pruning/blob/806ac5a6ff53b4978e5330d45a3de692493e4d0b/main.py#L218-L223

xihajun commented 1 year ago

Report on NN Pruning and Retraining-Free Method for QA Model

In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.

NN Pruning

We first applied NN pruning to the BERT-large model finetuned on SQuAD dataset, using the Hugging Face library. We also evaluated the original model downloaded from the mlperf website for comparison. The results are summarized in the following table:

Model	Sparsity	EM	F1
bert-large finetuned from huggingface	N/A	78.88%	86.84%
original model from mlperf	N/A	75.83%	84.23%
finetuned model from mlperf (59.9% sparsity)	59.9%	69.22%	78.70% [^1]

[^1]: about 6% drop

We observed that even the original model downloaded from the mlperf website did not perform as well as the model finetuned using the Hugging Face library on the SQuAD dataset. However, we also observed that increasing the sparsity of the model resulted in a drop in performance. For instance, when we pruned the model to 59.9% sparsity, we saw a drop of about 6% in the F1 score compared to the original model. We suspect that the difference in performance could be due to differences in the way we processed the SQuAD data or differences in the implementation of the model.

Retraining-Free Method

Applied the retraining-free method to the original BERT-large model downloaded from the mlperf website

Constraint	Value
10%	6.94
20%	8.09
30%	46.95
40%	80.81
50%	88.57
60%	89.72
70%	90.49
80%	90.74
90%	90.85

We observed that the retraining-free method resulted in reasonable compression of the model, with sparsity up to 90% achieving an F1 score above 90%. This suggests that the retraining-free method could be a useful approach for compressing large language models like BERT, without sacrificing performance.

Overall, our experiments demonstrate that NN pruning and retraining-free methods can be effective for compressing and optimizing QA models. However, the performance of the compressed models depends on several factors, such as the sparsity level, the implementation of the compression method, and the quality of the training data.

xihajun commented 1 year ago

Constraint	SQuAD Test Accuracy	SQuAD v2 Test Accuracy
100.00%	----	--------------
95.00%	----	45.34
90.00%	90.8533	--------------
80.00%	90.7409	--------------
75.00%	90.5938	--------------
70.00%	90.4981	45.16
65.00%	90.2348	45.08
60.00%	89.7231	-------------
50.00%	88.5696	-------------
40.00%	80.8123	-------------

xihajun commented 1 year ago

Report on NN Pruning and Retraining-Free Method for QA Model

In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.

NN Pruning (SQuAD) - BERT Large (MLPerf)

Model	Sparsity	Exact match	F1
bert-large finetuned for squad from huggingface	N/A	78.88%	86.84%
original model from mlperf	N/A	75.83%	84.23%
finetuned model from mlperf (59.9% sparsity)	59.9%	69.22%	78.70% [^1]

[^1]: about 6% drop

boringresearch / bert_pruning

Paper #1

SQuAD v2.0

SQuAD

Report on NN Pruning and Retraining-Free Method for QA Model

NN Pruning

Retraining-Free Method

Report on NN Pruning and Retraining-Free Method for QA Model

NN Pruning (SQuAD) - BERT Large (MLPerf)