bigscience-workshop / multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language
https://arxiv.org/abs/2212.09535
Apache License 2.0
69 stars 15 forks source link

Control Extra Params (use Adapter 16x reduction size as control) #25

Open yongzx opened 2 years ago

yongzx commented 2 years ago

The following info is for Bloom-1.3B and embedding-and-MADX-adapters (with replace strategy) with the default bottleneck reduction size of 16.

Total frozen parameters: 1208602624
Total trainable parameters: 24979456
Total emb parameters: 20488192
Total MAD-X adapter parameters: 4,491,264