Open xihajun opened 1 year ago
In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.
We first applied NN pruning to the BERT-large model finetuned on SQuAD dataset, using the Hugging Face library. We also evaluated the original model downloaded from the mlperf website for comparison. The results are summarized in the following table:
Model | Sparsity | EM | F1 |
---|---|---|---|
bert-large finetuned from huggingface | N/A | 78.88% | 86.84% |
original model from mlperf | N/A | 75.83% | 84.23% |
finetuned model from mlperf (59.9% sparsity) | 59.9% | 69.22% | 78.70% [^1] |
[^1]: about 6% drop
We observed that even the original model downloaded from the mlperf website did not perform as well as the model finetuned using the Hugging Face library on the SQuAD dataset. However, we also observed that increasing the sparsity of the model resulted in a drop in performance. For instance, when we pruned the model to 59.9% sparsity, we saw a drop of about 6% in the F1 score compared to the original model. We suspect that the difference in performance could be due to differences in the way we processed the SQuAD data or differences in the implementation of the model.
Applied the retraining-free method to the original BERT-large model downloaded from the mlperf website
Constraint | Value |
---|---|
10% | 6.94 |
20% | 8.09 |
30% | 46.95 |
40% | 80.81 |
50% | 88.57 |
60% | 89.72 |
70% | 90.49 |
80% | 90.74 |
90% | 90.85 |
We observed that the retraining-free method resulted in reasonable compression of the model, with sparsity up to 90% achieving an F1 score above 90%. This suggests that the retraining-free method could be a useful approach for compressing large language models like BERT, without sacrificing performance.
Overall, our experiments demonstrate that NN pruning and retraining-free methods can be effective for compressing and optimizing QA models. However, the performance of the compressed models depends on several factors, such as the sparsity level, the implementation of the compression method, and the quality of the training data.
Constraint | SQuAD Test Accuracy | SQuAD v2 Test Accuracy |
---|---|---|
100.00% | ---- | -------------- |
95.00% | ---- | 45.34 |
90.00% | 90.8533 | -------------- |
80.00% | 90.7409 | -------------- |
75.00% | 90.5938 | -------------- |
70.00% | 90.4981 | 45.16 |
65.00% | 90.2348 | 45.08 |
60.00% | 89.7231 | ------------- |
50.00% | 88.5696 | ------------- |
40.00% | 80.8123 | ------------- |
In this report, we present the results of our experiments on two different methods for compressing and optimizing a QA model, namely NN pruning and retraining-free method. We used the SQuAD dataset for evaluating the performance of the compressed models.
Model | Sparsity | Exact match | F1 |
---|---|---|---|
bert-large finetuned for squad from huggingface | N/A | 78.88% | 86.84% |
original model from mlperf | N/A | 75.83% | 84.23% |
finetuned model from mlperf (59.9% sparsity) | 59.9% | 69.22% | 78.70% [^1] |
[^1]: about 6% drop
https://github.com/WoosukKwon/retraining-free-pruning
SQuAD v2.0
SQuAD