facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

Randomly Initialized ESM's #143

Open nickbhat opened 3 years ago

nickbhat commented 3 years ago

Hello friends,

Requesting a very minimal feature addition. Could we get an esm.pretrained.esm_random (or similarly named), maybe with 4 or 5 seeds for completeness? It's not hard to randomly initialize one's own ESM models, but it would be a nice quality of life improvement. It would also ensure some consistency in these "random ESM" baselines :)

Thanks! Nick

tomsercu commented 2 years ago

Hi Nick! Sorry for the delay here. I think you'd still want to initialize as esm.pretrained.model_name() and then call reset_to_random_weights(seed) which in turn applies self.apply(init_function). Do you want to submit a PR for this? you can borrow from the fairseq roberta initializer

ulupo commented 2 years ago

I'm interested in this and may like to help with a PR if @nickbhat is not available.

From https://github.com/pytorch/fairseq/blob/8fce12ddd4a0414de5a726af6193cee4893f0a13/fairseq/models/roberta/model.py#L49, it seems that you just follow BERT's random weight initialization, and thus that everything in Linear, Embedding, MultiheadAttention, RowSelfAttention, and ColumnSelfAttention modules (except for biases and padding-related weights) should be initialized with i.i.d. Gaussian entries with mean 0 and std 0.02, see https://github.com/pytorch/fairseq/blob/8fce12ddd4a0414de5a726af6193cee4893f0a13/fairseq/modules/transformer_sentence_encoder.py#L21-L51. @tomsercu can you confirm that this is true of all your models including MSA Transformer?

nickbhat commented 2 years ago

@ulupo feel free to take this issue! Thanks for helping out.

tomsercu commented 2 years ago

@ulupo yes that's how we initialize, and indeed that would be the way to reset weights as well. Thanks for helping out!

ulupo commented 2 years ago

@tomsercu thanks a lot for the clarification!