facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

Provide pre-training code? #30

Closed Jacoberts closed 3 years ago

Jacoberts commented 3 years ago

Hi there!

I'm trying to compare ESM to UniRep, the embedding from the Church lab, for variant function prediction. Eventually, there are a few proteins our lab would like to optimize, and ESM has some advantages over UniRep. I need to "evolutionarily fine tune" ESM, as the Church lab does for UniRep: refine the global model's weights by continuing training on a small neighborhood (~100k sequences) around the target protein.

Could y'all provide any of the code you used in the pre-training task? Eg, your implementations of noising / masking, your loss function, or your gradient descent function?

Thank you, I think ESM is super cool! Best, Jacob

Jacoberts commented 3 years ago

Just saw the closed issue #11, which is very similar! If you're still not planning on providing any of your fairseq code, then I understand closing this out as duplicate. I'd really appreciate if you could provide the code, though!

joshim5 commented 3 years ago

HI @Jacoberts, thanks for your interest! In our experiments, we didn't see much of an improvement from evolutionary fine-tuning. For example, see Figure 15 in the appendix of our recent paper at ICLR 2021. We don't have plans to release any pre-training code at this time, but I would encourage you to try using ESM even without evolutionary fine-tuning. You may be surprised by the results!

Jacoberts commented 3 years ago

Hi @joshim5, thanks for your reply! I'm finding that ESM is underperforming UniRep and eUniRep on the prediction task defined by Alley et al. Honestly your results make sense to me: I wouldn't expect evotuning to do much. But the Church lab had a phenomenal increase in recall in the generalization set by evotuning! I think I'll try to whip up a fairseq config for ESM and see if eESM does any better.

image

gokceneraslan commented 3 years ago

Could y'all provide any of the code you used in the pre-training task? Eg, your implementations of noising / masking, your loss function, or your gradient descent function?

@joshim5 Do you mind commenting on this part of the issue too? Thanks :)

joshim5 commented 3 years ago

@gokceneraslan these details are listed in our pre-print. See page 21 "Pre-training task." If you find anything missing, feel free to start a new discussion and we're happy to clarify any details.

gokceneraslan commented 3 years ago

@joshim5 Thanks for the reply, sorry I missed the explanation that there is no plan to release any pre-training code at this time.

I hugely appreciate the quality of the released code but I think not releasing the training code (which is obviously non-trivial for a model with this complexity) highly hinders the overall reproducibility of the paper, and is a very bad practice, especially in the compbio domain (for those who are wondering how it should be done, here is a good example: https://github.com/kundajelab/bpnet-manuscript by @Avsecz).

hussius commented 2 years ago

@Jacoberts In case you managed to whip up a fairseq config, I'd be very grateful if you could share it!

michaelalb commented 2 years ago

@Jacoberts or anyone else who created code for pretraining with fairseq or any other framework and can share, It would be a big help

hussius commented 2 years ago

Since ESM-1b is now available on Huggingface (https://huggingface.co/facebook/esm-1b), you should be able to use the HuggingFace tooling for evolutionary finetuning/pretraining.

ulupo commented 2 years ago

Along these lines, I have the following question (tangentially relevant to #143): it isn't 100% clear to me, having read the MSA Transformer paper, whether the initial token embedding weights were also randomly initialised and learnt as part of the overall MLM pre-training, or whether pre-computed embeddings (trained separately in some other way) were fed to the model at pre-training time. I imagine the former was the case, but would appreciate the clarification. Thanks!

tomsercu commented 2 years ago

initial token embedding weights were also randomly initialised

Correct, This is how it was done, there is no change wrt fairseq TransformerSentenceEncoder self.embed_tokens

ulupo commented 2 years ago

Thanks @tomsercu, really appreciate the fast replies.

ulupo commented 2 years ago

@tomsercu just one last thing: You probably meant to also quote "and learnt as part of the overall MLM pre-training", right?

tomsercu commented 2 years ago

yes they're just regular model weights of the MSA transformer, all being trained.