Open nickbhat opened 3 years ago
Hi Nick! Sorry for the delay here.
I think you'd still want to initialize as esm.pretrained.model_name()
and then call reset_to_random_weights(seed)
which in turn applies self.apply(init_function)
.
Do you want to submit a PR for this? you can borrow from the fairseq roberta initializer
I'm interested in this and may like to help with a PR if @nickbhat is not available.
From https://github.com/pytorch/fairseq/blob/8fce12ddd4a0414de5a726af6193cee4893f0a13/fairseq/models/roberta/model.py#L49, it seems that you just follow BERT's random weight initialization, and thus that everything in Linear
, Embedding
, MultiheadAttention
, RowSelfAttention
, and ColumnSelfAttention
modules (except for biases and padding-related weights) should be initialized with i.i.d. Gaussian entries with mean 0 and std 0.02, see https://github.com/pytorch/fairseq/blob/8fce12ddd4a0414de5a726af6193cee4893f0a13/fairseq/modules/transformer_sentence_encoder.py#L21-L51. @tomsercu can you confirm that this is true of all your models including MSA Transformer?
@ulupo feel free to take this issue! Thanks for helping out.
@ulupo yes that's how we initialize, and indeed that would be the way to reset weights as well. Thanks for helping out!
@tomsercu thanks a lot for the clarification!
Hello friends,
Requesting a very minimal feature addition. Could we get an
esm.pretrained.esm_random
(or similarly named), maybe with 4 or 5 seeds for completeness? It's not hard to randomly initialize one's own ESM models, but it would be a nice quality of life improvement. It would also ensure some consistency in these "random ESM" baselines :)Thanks! Nick