5g4s / paper

0 stars 0 forks source link

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers #23

Open 5g4s opened 1 year ago

5g4s commented 1 year ago

https://arxiv.org/abs/2002.11794

5g4s commented 1 year ago

image

5g4s commented 1 year ago

image

5g4s commented 1 year ago

We show that large models are more robust to compression techniques such as quantization and pruning than small models. Heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

5g4s commented 1 year ago

image

5g4s commented 1 year ago

As models become increasingly large, they contain small subnetworks which achieve high accuracy.