facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

read_msa in examples/variant-prediction does not remove insertions #230

Closed Jacoberts closed 2 years ago

Jacoberts commented 2 years ago

Hi all! Very excited to use MSA Transformer. Our calls to predict.py were failing due to MSA length mismatches [1]. Comparing read_msa in predict.py to read_msa in contact_prediction.ipynb makes me think that it's missing the remove_insertions functionality.

I'm gonna try adding that and seeing if it works!

[1]

Transferred model to GPU
Traceback (most recent call last):
  File "/code/esm/examples/variant-prediction/predict.py", line 239, in <module>
    main(args)
  File "/code/esm/examples/variant-prediction/predict.py", line 165, in main
    batch_labels, batch_strs, batch_tokens = batch_converter(data)
  File "/opt/conda/lib/python3.7/site-packages/esm/data.py", line 325, in __call__
    "Received unaligned sequences for input to MSA, all sequence "
RuntimeError: Received unaligned sequences for input to MSA, all sequence lengths must be equal.
tomsercu commented 2 years ago

ah indeed we expected an aligned MSA here. Thanks for catching this!

tomsercu commented 2 years ago

Thanks so much for identifying and fixing this issue!