Open GeneZC opened 3 days ago
Along the development of small language models, compressed language models play crucial roles as well.
Typical representatives (in time order) would be: 1) sheared-llama (https://arxiv.org/abs/2310.06694), which is a pruned language model from llama. 2) minima (https://arxiv.org/abs/2311.07052), which is a distilled language model from llama. 3) gemma-2-2b (https://arxiv.org/abs/2408.00118), which is a distilled language model. 4) minitron-4b (https://arxiv.org/abs/2408.11796), which is a distilled language model from llama. etc.
I believe including discussions related to above small language models would make this survey even stronger : )
Thanks for your valuable suggestion! We will continue to update our survey and include discussions on compressed language models in future versions.
Along the development of small language models, compressed language models play crucial roles as well.
Typical representatives (in time order) would be: 1) sheared-llama (https://arxiv.org/abs/2310.06694), which is a pruned language model from llama. 2) minima (https://arxiv.org/abs/2311.07052), which is a distilled language model from llama. 3) gemma-2-2b (https://arxiv.org/abs/2408.00118), which is a distilled language model. 4) minitron-4b (https://arxiv.org/abs/2408.11796), which is a distilled language model from llama. etc.
I believe including discussions related to above small language models would make this survey even stronger : )