Closed duggurd closed 1 year ago
Try smaller transformer model than 66 mil parameters. Pre-trained is probably the better choice.
about 4mil parameters
BERT-tiny
About 12 mil parameters BERT-mini
Create own smaller model with BertConfig/DistilBertConfig
Try smaller transformer model than 66 mil parameters. Pre-trained is probably the better choice.
Bert tiny
about 4mil parameters
BERT-tiny
Bert mini
About 12 mil parameters BERT-mini
From scratch
Create own smaller model with BertConfig/DistilBertConfig