Electra-small pretraining

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13.5k stars 3.22k forks source link

Electra-small pretraining #1086

Open 73minerva opened 2 years ago

73minerva commented 2 years ago

Hey,

I want to pretrain and benchmark small and base versions of Electra for the Arabic and Persian languages. As mentioned in the run_pretraining python file, only "base" and "large" model_size are supported. Although it seems that in fine-tuning, the Electra-small model is supported. Is the implementation of pretraining for Electra-small in your future plans? If it's not, what about a PR ?!

sharathts commented 2 years ago

The small model might work if you can come up with a config for it based on the model parameters.

A PR would definitely be appreciated. Feel free to make one.