Closed jas-preet closed 3 years ago
Hi Jaspreet, thanks for trying out our new model!
What is the content of data
? The quality of contact prediction will definitely depend on the "hardness" of the target as well as the quality of the MSA you can construct for a given protein.
Hi Thanks. I appreciate the quick response.
The data in my case is
data = ("5LOS_A","GPGSAPLPNPPMTPAQHYAQAIHHEGLARHHTTVAEDHRQTANLHDNRIKAAKARYNAGLDPNGLTSAQKHQIERDHHLSLAAQAERHAATHNREAAYHRLHSQTPAPGTKRSIDELD")
also this is what the predicted output looks like:
I think maybe I realise what I did wrong. I think I should be using the msa as the input instead of using just the sequence. Apologies for the inconvenience caused.
Aha yes that explains it 👍 the MSA transformer takes an MSA as input.
Bug overview Unsupervised contact map prediction from ESM-MSA-1 seems bugged.
Bug description When I generate the contact-map its seems to be not able to predict any long range contacts or even medium range contacts. I tested it on casp 14 targets and the performance seems much worse than expected based on the results reported in the manuscript.
Additional information the following code was used to generate the output
model, alphabet = torch.hub.load("facebookresearch/esm", "esm_msa1_t12_100M_UR50S")
batch_converter = alphabet.get_batch_converter()
batch_labels, batch_strs, batch_tokens = batch_converter(data)
with torch.no_grad():
contact = model.predict_contacts(batch_tokens)