evolutionaryscale / esm

Other
797 stars 80 forks source link

Function ESMProtein.to_PDB() stores all residues pLDDT as 1.00 #7

Closed mariaartlle closed 5 days ago

mariaartlle commented 6 days ago

Hi,

First of all, congrats on this amazing work! I have been generating several sequences (~500) using the code provided in the example:

prompt = "HERPYACP_________________________________________________________________________________" protein = ESMProtein(sequence=prompt) protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.5)) protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("protein.pdb")

The problem is that all the predicted structures from the generated sequences have an average of 1.00 pLDDT. I find this very strange, especially as the results of pLDDT that ESMFold gives for the same sequence are not the same. Am I missing some parameters to define in the generation or do I have to prompt differently?

Thank you!

carolynkim commented 6 days ago

Thanks for trying out our model! I am having a bit of trouble reproducing your issue. When I run:

model: ESM3InferenceClient = ESM3.from_pretrained("esm3_sm_open_v1").to("cuda") 
prompt = "HERPYACP_________________________________________________________________________________"  
protein = ESMProtein(sequence=prompt)  
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.5))  
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))  
print(protein.plddt)

This prints a tensor filled with floats that are less than 1.00. I see you saved to a pdb at the end of your code snippet. Could you give us a snippet of code that includes how you calculated the average pLDDT?

mariaartlle commented 6 days ago

Thank you for cheking! I have executed the code in your reply and it is true, the tensor I get contains different pLDDT floats:

print(protein.plddt)
tensor([0.8923, 0.9578, 0.9728, 0.9754, 0.9825, 0.9845, 0.9841, 0.9799, 0.9813,
        0.9822, 0.9838, 0.9838, 0.9848, 0.9840, 0.9809, 0.9800, 0.9817, 0.9770,
        0.9807, 0.9842, 0.9822, 0.9816, 0.9833, 0.9840, 0.9848, 0.9817, 0.9826,
        0.9819, 0.9829, 0.9850, 0.9867, 0.9830, 0.9869, 0.9884, 0.9883, 0.9870,
        0.9874, 0.9875, 0.9871, 0.9874, 0.9878, 0.9869, 0.9833, 0.9834, 0.9830,
        0.9801, 0.9818, 0.9877, 0.9856, 0.9854, 0.9870, 0.9880, 0.9876, 0.9852,
        0.9870, 0.9864, 0.9862, 0.9862, 0.9877, 0.9845, 0.9871, 0.9875, 0.9851,
        0.9814, 0.9805, 0.9816, 0.9850, 0.9851, 0.9868, 0.9870, 0.9842, 0.9840,
        0.9831, 0.9780, 0.9819, 0.9849, 0.9799, 0.9789, 0.9799, 0.9785, 0.9745,
        0.9705, 0.9665, 0.9655, 0.9537, 0.9500, 0.9281, 0.8999, 0.8744])

But when I use the function protein.to_pdb("protein.pdb") indicated in the README example, what I get is a pLDDT of 1 for all residues: image

Is this a bug or do I have to add a parameter to add the pLDDT of the protein when using .to_pdb() ?

ebetica commented 5 days ago

This is a bug, we're working on it.

carolynkim commented 5 days ago

Thanks for finding this bug, and sorry about the initial misunderstanding! This should be fixed now -- please let us know if you still have issues.