To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while fine-tuning the remaining weights. In this work, we argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level. In addition, we also design a new masking function and training objective to further improve SMP. Extensive experiments at various sparsity levels show SMP has significant improvements over first-order and zero-order methods. Unlike previous first-order methods, SMP is also applicable to low sparsity and outperforms zero-order methods. Meanwhile, SMP is more parameter efficient than other methods due to it does not require fine-tuning.

Translation (by gpt-3.5-turbo)

Pre-trained Language Models（PLMs）の過パラメータ化の問題を克服するために、プルーニングは重要でない重みを直接削除することによって、シンプルで直感的な圧縮手法として広く使用されています。従来の一次元の手法は、移動プルーニングなどの方法を使用して、残りの重みを微調整しながらPLMsを非常に高い疎密度に圧縮することに成功してきました。本研究では、一次元のプルーニングにおいては微調整は不要であり、一次元のプルーニングだけでPLMsを下流のタスクに収束させることが十分であると主張します。この動機に基づいて、我々はStatic Model Pruning（SMP）を提案します。SMPは、目標の疎密度レベルを達成しながら、下流のタスクにPLMsを適応させるために一次元のプルーニングのみを使用します。さらに、新しいマスキング関数とトレーニング目的を設計して、SMPをさらに改善します。さまざまな疎密度レベルでの徹底的な実験結果は、SMPが一次元およびゼロ次元の手法よりも大幅に改善されていることを示しています。従来の一次元の手法とは異なり、SMPは低い疎密度にも適用可能であり、ゼロ次元の手法を上回ります。また、SMPは微調整を必要としないため、他の手法よりもパラメータ効率が高いです。
Summary (by gpt-3.5-turbo)
本研究では、Pre-trained Language Models（PLMs）の過パラメータ化の問題を解決するために、一次元のプルーニングを使用したシンプルで直感的な圧縮手法であるStatic Model Pruning（SMP）を提案します。SMPは、下流のタスクにPLMsを適応させるために一次元のプルーニングのみを使用し、微調整を必要としないため、他の手法よりも効率的です。徹底的な実験結果は、SMPが一次元およびゼロ次元の手法よりも大幅に改善されていることを示しています。また、SMPは低い疎密度にも適用可能であり、ゼロ次元の手法を上回ります。

AkihikoWatanabe / paper_notes

Pruning Pre-trained Language Models Without Fine-Tuning, ACL'23 #812

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)