LPDI-EPFL / masif

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
Apache License 2.0
582 stars 154 forks source link

Masif_ppi_search output #26

Open abbynewbury opened 3 years ago

abbynewbury commented 3 years ago

Hi, I have run masif_ppi_search on the PDB 6M3M_A and have gotten two output files labeled: p1_desc_flipped.npy and p1_desc_straight.npy I cannot find documented what these two files mean and how they relate to finding binders of the input PDB. Any help would be appreciated, thanks!

jaekor91 commented 3 years ago

@abbynewbury "p1desc*.npy" contains a NN produced descriptor for each vertex of the triangular mesh. The difference between the two versions is that, for one of them, the signs of select features are flipped before inputting into the network as described in the paper.

Disclaimer: I am not an author of the pipeline. :)

abbynewbury commented 3 years ago

Thank you! I got that output from running ./data_prepare_one.sh 6M3M_A ./compute_descriptors.sh 6M3M_A

Is there another script I need to run (maybe eval_gif_descriptors?)to get a final output of the predicted best binders as I am not sure what to do with the numpy files themselves.

pablogainza commented 3 years ago

Hi! Sorry for the very late reply !

It is exactly as jaekor91 pointed out. The flipped descriptors are descriptors whose features are flipped. The straight descriptors are descriptors that are not flipped. If you want complementary surfaces, you always compare flipped to straight (straight to straight would make sense if you are trying to find similar surfaces, not complementary)

So if you take a point in the center of your interface and take its descriptor from p1_desc_flipped.npy, then you should compared this point to descriptors in files called 'p1_desc_straight.npy'. Flipped descriptors must always be compared to straight descriptors.

In general, the discrimination capacity of these descriptors is quite good if in your target you pick a point in the center of the interface (ROC AUC > 0.95).

However, for practical applications, this isn't enough because you can still have hundreds of millions of patches from thousands of proteins. This is why we have a second stage alignment method to refine top results.

I think what you want to do is follow the tutorial in https://github.com/LPDI-EPFL/masif/blob/master/docker_tutorial.md in the section called MaSIF PDL1 benchmark. You can do it for PD-L1 as in the example and then do it for another protein.