ESM2 Language Head can not correctly decode Embeddings

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.25k stars 642 forks source link

Even though MLM is the pretraining objective which allows the model to learn meaningful representations, the model does not get perfect at the pretext task (this would correspond to perplexity 1 or all probability mass on the correct token). So we do not expect the model to perfectly be able to "decode" embeddings. If the sequences are well-understood by the model (most, but not all of the training data), then we expect all amino acid in the input sequence to also have high output probability, this coresponds to a low perplexity for that sequence.

facebookresearch / esm

ESM2 Language Head can not correctly decode Embeddings #313