facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

Embedding proteins in batches with MSA transformer? #56

Closed Haxxardoux closed 3 years ago

Haxxardoux commented 3 years ago

Passing batches of proteins to the other ESM transformers seems to work fine, but with the MSA transformer, it seems like a specific error is raised -

image

Is there a supported way to encode batches of proteins in a single forward pass?

tomsercu commented 3 years ago

Hi Will, seems like you're essentially trying to input a single sequence, while the MSA Transformer expects an MSA (an aligned set of sequences) as input. See this notebook for an example.

If my guess about the issue is wrong, feel free to reopen with a MWE!