pre-train with ELECTRA rather than BERT

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.16k stars 627 forks source link

pre-train with ELECTRA rather than BERT #476

Closed Joseph-Vineland closed 1 year ago

Joseph-Vineland commented 1 year ago

Is the code used for pre-training the ESM2 model available? Just for interest, I want to see if I can modify it so that the ELECTRA method is used for pre-training, rather than BERT. (ELECTRA is faster and more efficient at learning)

https://arxiv.org/abs/2003.10555 https://github.com/Joseph-Vineland/ProteinELECTRA

Maybe if you have to pre-train more in the future, or if you are going to pre-train ESM2 again, you can use the ELECTRA pre-training method.

tomsercu commented 1 year ago

Thanks for the suggestion! We have definitely considered ELECTRA and other pre-training objectives in the past, but don't think we have actually trained any ELECTRA models yet. We'll update this repo with any new versions of ESM, MLM objective or otherwise!

Joseph-Vineland commented 1 year ago

Thank you. Just for curiousity, is the code used for pre-training the ESM2 model available? I would like to have a look if possible.

tomsercu commented 1 year ago

We haven't open sourced it but it should be straightforward to do it using vanilla fairseq and the config information in our model checkpoints