Difference between different ESM1v models?

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.16k stars 627 forks source link

Difference between different ESM1v models? #111

Closed xinformatics closed 3 years ago

xinformatics commented 3 years ago

Hi, I am trying to compare ESM1b vs ESM1v embeddings for downstream prediction tasks? I observe that the ESM1v model has 5 variants such as esm1v_t33_650M_UR90S_1, esm1v_t33_650M_UR90S_2 etc.. Could you please tell how these models are different? Thanks

tomsercu commented 3 years ago

Hi, they're using different random seeds. We use them to ensemble predictions.

xinformatics commented 3 years ago

Do you expect that the embeddings from models trained with different seeds to be significantly different from each other?

tomsercu commented 3 years ago

Yes the embeddings will be very different, since the random initialization will send them in a totally different direction. But the LM probabilities will usually be rather close.

xinformatics commented 3 years ago

Thank you so much for the help.

ptynecki commented 3 years ago

@xinformatics how are you planning to test ESM1v embeddings in your downstream prediction tasks? By averaging the vectors from all the models into the one?

xinformatics commented 3 years ago

@ptynecki I am not using ESM1 and ESM1b. I am working with only the mean embeddings obtained from ESM1v. My downstream task is multi-input multi-output so I can't really fine tune the model and hence I have to use the embeddings.