facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.16k stars 627 forks source link

ESM-MSA-1 unsupervised contact prediction Bug #46

Closed jas-preet closed 3 years ago

jas-preet commented 3 years ago

Bug overview Unsupervised contact map prediction from ESM-MSA-1 seems bugged.

Bug description When I generate the contact-map its seems to be not able to predict any long range contacts or even medium range contacts. I tested it on casp 14 targets and the performance seems much worse than expected based on the results reported in the manuscript.

Additional information the following code was used to generate the output model, alphabet = torch.hub.load("facebookresearch/esm", "esm_msa1_t12_100M_UR50S") batch_converter = alphabet.get_batch_converter() batch_labels, batch_strs, batch_tokens = batch_converter(data) with torch.no_grad(): contact = model.predict_contacts(batch_tokens)

tomsercu commented 3 years ago

Hi Jaspreet, thanks for trying out our new model! What is the content of data? The quality of contact prediction will definitely depend on the "hardness" of the target as well as the quality of the MSA you can construct for a given protein.

jas-preet commented 3 years ago

Hi Thanks. I appreciate the quick response.

The data in my case is data = ("5LOS_A","GPGSAPLPNPPMTPAQHYAQAIHHEGLARHHTTVAEDHRQTANLHDNRIKAAKARYNAGLDPNGLTSAQKHQIERDHHLSLAAQAERHAATHNREAAYHRLHSQTPAPGTKRSIDELD")

also this is what the predicted output looks like: image

I think maybe I realise what I did wrong. I think I should be using the msa as the input instead of using just the sequence. Apologies for the inconvenience caused.

tomsercu commented 3 years ago

Aha yes that explains it 👍 the MSA transformer takes an MSA as input.