facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

Provide predicted aligned error (PAE) from the ESM Metagenomic Atlas prediction API #370

Closed tomgoddard closed 2 years ago

tomgoddard commented 2 years ago

Currently predictions run with the ESM Metagenomic Atlas API, for example

curl -X POST --data "GENGEIPLEIRATTGAEVDTRAVTAVEMTEGTLGIFRLPEEDYTALENFRYNRVAGENWKPASTVIYVGGTYARLCAYAPYNSVEFKNSSLKTEAGLTMQTYAAEKDMRFAVSGGDEVWKKTPTANFELKRAYARLVLSVVRDATYPNTCKITKAKIEAFTGNIITANTVDISTGTEGSGTQTPQYIHTVTTGLKDGFAIGLPQQTFSGGVVLTLTVDGMEYSVTIPANKLSTFVRGTKYIVSLAVKGGKLTLMSDKILIDKDWAEVQTGTGGSGDDYDTSFN" https://api.esmatlas.com/foldSequence/v1/pdb/

described here

https://esmatlas.com/about#api

does not give any way to get the predicted aligned error (PAE). The PAE data is a key aspect of the prediction indicating if the protein domains are packed correctly. It would be very useful to provide both the PDB predicted structure and JSON PAE file. One way to achieve this would be to create a new REST API that returns a zip file containing the PDB and JSON files.

tomgoddard commented 2 years ago

The PAE data can be viewed and interactively analyzed together with the protein structure in the ChimeraX visualization program that I develop. I've added an ESMFold prediction capability to ChimeraX using the Meta server and would like it to be able to get the PAE data.

 https://www.rbvi.ucsf.edu/chimerax/data/esmfold-nov2022/esmfold.html
tomsercu commented 2 years ago

Hi Tom, thanks for your work on ChimeraX, we were impressed by the beautiful visualizations! I'll add this feature request to our internal tracker and prioritize accordingly.

tomgoddard commented 2 years ago

Thanks!