LPDI-EPFL / masif

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
Apache License 2.0
572 stars 151 forks source link

Pdl1 benchmark looking for masif_opts["coord_dir_npy"] #6

Closed av1659 closed 4 years ago

av1659 commented 4 years ago

Looks like an artifact from matlab. What's the fix for this?

pablogainza commented 4 years ago

Indeed, sorry about that. I'll fix the pdl1 benchmark today.

On Wed, Mar 18, 2020 at 2:43 AM av1659 notifications@github.com wrote:

Looks like an artifact from matlab. What's the fix for this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQXISMRVPDWS7E4T7H3RIARKLANCNFSM4LOBUP5A .

av1659 commented 4 years ago

Thanks for taking a look! I've been trying to debug on my end as well. Did you replace the 'coords_mds.m' or '03-compute_coords.py' files with something else in this version?

pablogainza commented 4 years ago

Hi!

sorry about this bug. I fixed it now.

I don't know if you precomputed all the data, but one thing I could do for you is sharing the precomputed data so that you can run this from docker without recomputing it all yourself. It is about 100GB.

On Fri, Mar 20, 2020 at 9:00 AM av1659 notifications@github.com wrote:

Thanks for taking a look! I've been trying to debug on my end as well. Did you replace the 'coords_mds.m' or '03-compute_coords.py' files with something else in this version?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6#issuecomment-601575472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQRF6U6L3PFPTWCBT6TRIMO77ANCNFSM4LOBUP5A .

av1659 commented 4 years ago

Hi - i would really appreciate if you can share access to the precomputed data. For a test case, I have just been running the target 4ZQK_A against 4ZQK_B to see if these two partners receive a high score.

However the model is not finding any intersection in selected = np.intersect1d(true_iface, near_points)

Why is this? Since they are actual partners there should be some intersection I believe.

pablogainza commented 4 years ago

Indeed it should find them! I will repeat this test in docker.

On Mon, Mar 23, 2020 at 11:58 PM av1659 notifications@github.com wrote:

Hi - i would really appreciate if you can share access to the precomputed data. For a test case, I have just been running the target 4ZQK_A against 4ZQK_B to see if these two partners receive a high score.

However the model is not finding any intersection in selected = np.intersect1d(true_iface, near_points)

Why is this? Since they are actual partners there should be some intersection I believe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6#issuecomment-602902231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQXHFQYQXDZRZA2M6SDRI7SRNANCNFSM4LOBUP5A .

pablogainza commented 4 years ago

Hi ! I'm finding some very weird behavior that only happens in docker. It may be one of the libraries. I'll get back to you soon.

On Tue, Mar 24, 2020 at 8:20 AM Pablo Gainza-Cirauqui < pablo.gainza@gmail.com> wrote:

Indeed it should find them! I will repeat this test in docker.

On Mon, Mar 23, 2020 at 11:58 PM av1659 notifications@github.com wrote:

Hi - i would really appreciate if you can share access to the precomputed data. For a test case, I have just been running the target 4ZQK_A against 4ZQK_B to see if these two partners receive a high score.

However the model is not finding any intersection in selected = np.intersect1d(true_iface, near_points)

Why is this? Since they are actual partners there should be some intersection I believe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6#issuecomment-602902231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQXHFQYQXDZRZA2M6SDRI7SRNANCNFSM4LOBUP5A .

av1659 commented 4 years ago

Hi! To get the docker to run the benchmark, I had to uninstall open3d and do pip install open3d-python. I also had to move import pymesh to the top of all the import statements.

This is what I did for 4ZQK: (from within data/masif_ppi_search)

./data_prepare_one.sh 4ZQK_A
./data_prepare_one.sh 4ZQK_B
./compute_descriptors.sh 4ZQK_A
./compute_descriptors.sh 4ZQK_B

(from within data/masif_site)

 ./data_prepare_one.sh 4ZQK_A
./data_prepare_one.sh 4ZQK_B
 ./predict_site.sh 4ZQK_A
 ./predict_site.sh 4ZQK_B
 ./color_site.sh 4ZQK_A
 ./color_site.sh 4ZQK_B

Then ran run_benchmark.sh from within data/masif_ppi_search/pdl1_benchmark

av1659 commented 4 years ago

Hi! I tried your new code run_benchmark_nn.sh, still just with 4ZQK_A and 4ZQK_B. If I set cutoff=2.0 or more, then it finds overlap in 4ZQK_B. For default cutoff 1.7, still no luck.

For cutoff 1.7, I get this when I put a breakpoint before selected = np.intersect1d(true_iface, near_points):

(Pdb) true_iface
array([ 100,  152,  159,  165,  174,  187,  203,  222,  291,  354,  377,
        516,  536,  577,  594,  606,  625,  636,  649,  666,  669,  686,
        687,  699,  717,  746,  777,  830,  886,  903,  934,  973,  990,
        993, 1011, 1024, 1040, 1096, 1131, 1139, 1149, 1157, 1232, 1281,
       1289, 1292, 1317, 1343, 1408, 1467, 1484, 1493, 1593, 1599, 1606,
       1664, 1695, 1711, 1752, 1785, 1787, 1797, 1801, 1811, 1812, 1817,
       1890, 1916, 1922, 1961, 1980, 2004, 2034, 2060, 2124, 2134, 2170,
       2183, 2203, 2216, 2255])
(Pdb) near_points
array([668])

Perhaps my descriptors are different from your precomputed descriptors, for some reason? I ran your simplified data_prepare_compute_descriptors.sh (thanks for that!) Please advise.

pablogainza commented 4 years ago

Hi !

Yes, indeed, I've been experimenting with this over the past few days just to be sure. Here is what I've found so far.

1) First of all, you can reproduce the paper results by downloading the following tar file from Dropbox (I've tried like crazy to put it into Zenodo but it fails every time - I will continue trying :( ) :

https://www.dropbox.com/s/aaf5nt6smbrx8p7/masif_pdl1_benchmark_precomputed_data.tar?dl=0

2) I improved the code slightly to bring it in line to the docking benchmark. Basically, the old code didn´t exploit a learned scoring function or the ICP algorithm to refine alignments. Now it does. This slightly improves results.

3) I have written a tutorial for reproducing these results here:

https://github.com/LPDI-EPFL/masif/edit/master/docker_tutorial.md

4) Like you´ve found, it seems that the way the surfaces are generated in Docker are different. This introduces some slight changes, that make the process choose a different patch-center-point in the PDL1 target. It seems that it chooses one with less complementarity around the center of this new patch, which makes the fingerprints a bit more dissimilar.

However the method still works this way- it just takes a little bit longer. To run it you can change the value for DESC_DISC_CUTOFF in the file /masif/source/masif_ppi_search/pdl1_benchmark_nn.py.

A suggested value is 2.0 or 2.2

In my experiments it takes twice as much time (about 1 hour in my experiments).

Indeed, this is one of the challenges of the way we do things here which we discuss extensively in the paper. The method relies on complementarity to find the complementarity binding patch really fast.

Please let me know if this helps Thanks! Pablo

On Tue, Mar 31, 2020 at 2:21 AM av1659 notifications@github.com wrote:

Hi! I tried your new code run_benchmark_nn.sh, still just with 4ZQK_A and 4ZQK_B. If I set cutoff=2.0 or more, then it finds overlap in 4ZQK_B. For default cutoff 1.7, still no luck.

For cutoff 1.7, I get this when I put a breakpoint before selected = np.intersect1d(true_iface, near_points):

(Pdb) true_iface array([ 100, 152, 159, 165, 174, 187, 203, 222, 291, 354, 377, 516, 536, 577, 594, 606, 625, 636, 649, 666, 669, 686, 687, 699, 717, 746, 777, 830, 886, 903, 934, 973, 990, 993, 1011, 1024, 1040, 1096, 1131, 1139, 1149, 1157, 1232, 1281, 1289, 1292, 1317, 1343, 1408, 1467, 1484, 1493, 1593, 1599, 1606, 1664, 1695, 1711, 1752, 1785, 1787, 1797, 1801, 1811, 1812, 1817, 1890, 1916, 1922, 1961, 1980, 2004, 2034, 2060, 2124, 2134, 2170, 2183, 2203, 2216, 2255]) (Pdb) near_points array([668])

Perhaps my descriptors are different from your precomputed descriptors, for some reason? I ran your simplified data_prepare_compute_descriptors.sh (thanks for that!) Please advise.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6#issuecomment-606321747, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQUZ2R64NT74YPGSIFTRKEZQ5ANCNFSM4LOBUP5A .

av1659 commented 4 years ago

Hi! I'm happy to say that the cutoff for 1.7 works in the docker now. I had to reinstall pymesh (not with pip) and this fixed the problem.

av1659 commented 4 years ago

Hi there - I downloaded your precomputed data, but the predicted surfaces directory is not there - "masif_site/output/all_feat_3l_pred_surfaces." Can you provide the link to this for the PDL1 benchmark? Thanks so much!

pablogainza commented 4 years ago

Hi !

So you don't really need the precomputed surfaces, except for the one for 4ZQK_A, which is included there.

For the others the data is read from the pred_data directory (they are just numpy arrays with the predictions)

On Sat, Apr 11, 2020 at 8:03 AM av1659 notifications@github.com wrote:

Hi there - I downloaded your precomputed data, but the predicted surfaces directory is not there - "masif_site/output/all_feat_3l_pred_surfaces"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LPDI-EPFL/masif/issues/6#issuecomment-612339533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7GNQXTDOMLSMOZBTATVPDRMAB2VANCNFSM4LOBUP5A .

av1659 commented 4 years ago

Got it to work! Thanks