Closed VincentH23 closed 1 year ago
Thanks for your interest! A value of 1.0 indicates that autodock wasn't working for some reason, so I suggest removing stderr=subprocess.DEVNULL
in utils.py. A few people have had this problem, so I changed the default to show errors from autodock (see https://github.com/Rose-STL-Lab/LIMO/commit/59ce1ed0f7942d9efc1590530d4ce9ae8d04aabd). This might produce a bit extra text output, but hopefully it will allow people to better catch errors.
After you see what the error is, if you paste it here I can try to help find the problem.
I have this error when I try to train predictor for the binding affinity : bin/sh: 1: AutoDock-GPU/bin/autodock_gpu_128wi: not found I installed AutoDock-GPU from conda but I don't know why it can't find the auto dock file.
Thanks for your interest! A value of 1.0 indicates that autodock wasn't working for some reason, so I suggest removing
stderr=subprocess.DEVNULL
in utils.py. A few people have had this problem, so I changed the default to show errors from autodock (see 59ce1ed). This might produce a bit extra text output, but hopefully it will allow people to better catch errors.After you see what the error is, if you paste it here I can try to help find the problem.
It looks like it's trying to access the autodock executable from a folder that doesn't exist. If you installed it via conda, it might be possible just to supply --autodock_executable autodock_gpu_128wi
(or whatever the command you use when running autodock) when running any scripts that require autodock. Alternatively, if you can find the actual autodock executable file somewhere on your system, you can point to it with the --autodock_executable
flag. Finally, if that doesn't work, you could try compiling autodock (https://github.com/ccsb-scripps/AutoDock-GPU#compilation) and using that executable.
It looks like it's trying to access the autodock executable from a folder that doesn't exist. If you installed it via conda, it might be possible just to supply
--autodock_executable autodock_gpu_128wi
(or whatever the command you use when running autodock) when running any scripts that require autodock. Alternatively, if you can find the actual autodock executable file somewhere on your system, you can point to it with the--autodock_executable
flag. Finally, if that doesn't work, you could try compiling autodock (https://github.com/ccsb-scripps/AutoDock-GPU#compilation) and using that executable.
Thank you for your answer I recompile auto deck in order to obtain the file autodock_gpu_128wi, the previous error disappears. However, I still have a Kd value equal to 1.0 because the free energy is equal to 0.
Ok, thanks for recompiling. The free energy should never equal 0 normally, so something must be wrong with autodock still. You might want to remove stdout=subprocess.DEVNULL
from line 214 of utils.py (https://github.com/Rose-STL-Lab/LIMO/blob/main/utils.py#L214) as well, because sometimes errors can show up in stdout. If nothing shows up there, you can try to run autodock directly from the command line and see if it works. Maybe a good debugging step would be to put an exit()
directly before line 214 in utils.py and print the autodock command that would run, and see if you can run it yourself on the command line.
When I run autodock with command line, the free energy computing works, but when I run the code I always have free energy equal to 0. I don't understand why you take the minimal value between 0 and the value from autodock as free energy. If we do that we should have a Kd <1 but in your paper the Kd value can be higher than 1 (for ACAA1)
Free energy (which is what autodock outputs) is different than Kd. The more negative the free energy is, the higher the Kd will be. There is a function that converts from free energy (delta G) to Kd, delta_g_to_kd
in utils.py. The reason I take the minimum between 0 and the autodock energy is because if the autodock energy is >0 it usually indicates an extremely unfavorable ligand or an error in the computation, so I just cap it at 0 so I don't get extremely high values. For ACAA1, I believe the free energy (which is what autodock outputs) should be around -10 for the most favorable ligands.
What's the command you use to run autodock on the command line? And what sort of result is it giving you?
Thank for your help.
I solved the problem by creating a file with the list of ligands and using this command line in your code : "f'{autodock} -seed 0 -filelist file.txt -resnam ./test2/results -D {device + 1}', shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)".
I have another questions about your paper;
1) The results in the tables in your paper are the best results or the average over 10 trials ? Because I can't reach your table results for penalized logP.
2) For the Kd values found in your table, you have 0.72 (ESR1) and 37(ACAA1) in the best cases but when I run your code I have 2.7e-9. Is that the output values of the code are in mole/liter and not in nanomole/liter as in your paper ?
Glad I could help. For your questions,
Thank for your answer.
Could you send me the data that you generate for the property predictor training ? I think the difference between my results and the results from your paper is due to the difference between the data which used to train property predictor.
I don't have the data available anymore, since it's relatively easy to regenerate, but you could try using the pretrained predictor in property_models/penalized_logp.pt
which was trained on the original data. You could also try increasing --num_mols
when training the property predictor, which will improve prediction performance. What sort of r values are you getting when training the predictor? For anything less than 0.5 or so I would suggest increasing --num_mols
.
Overall, though, I'm surprised the logp results you're getting are very different, because logp optimization is a very simple problem (it's more of a sanity test of a model rather than an actually relevant metric). What's the best logp result you've gotten so far?
I cloned your code. At first, I preprocessed the zinc250k data. When I want to load your file (vae.pt), I get this error: "_pickle.UnpicklingError: invalid load key, 'v'.", so I train your vae model to get a new vae.pt file.
Here are all the results I got:
Vae (trained by me) + property predictor (your file): best plogp 6.3 Vae (trained by me) + property predictor (trained by me): best plogp 8.3.
Maybe the problem comes from the VAE, because I did not use your file.
Could you take a look at this issue: https://github.com/Rose-STL-Lab/LIMO/issues/3? Some others had the same unpickling error for the VAE file, but maybe you could try the suggestions there and let me know if it works for you. If my suggestions work, I'll put it in the README, since it seems like a lot of people have this problem.
The 6.3 number you're getting was actually from an ablation study we performed to test attaching a property predictor to either the VAE decoder (which is regular LIMO) or the latent space directly (which we call "LIMO on z"). It looks like you're getting the 6.3 value from LIMO on z. One row below it we have the value from LIMO, which is 10.5. This is a little different than what you're getting, but it's possible running it on my vae file would give better results. It's also possible this is just random variation, since a difference of +/- 2 logp is not very significant.
How do I know if I am using the property predictor of z or with the VAE decoder, because I am only running your code and the input size of the property predictor and the size of z are different? When I use your vae.pt from google drive, the code works but I get 0.3 for the penalized logP value with your files (vae.pt + penalized_logp.pt). I will try to re-train the property prediction model.
The code in this repository is only with the VAE decoder, the property predictor on z is not published here. And yes, the property predictor will work the best when trained using the same VAE as generation, so retraining is a good idea.
Thanks for your help, I finally succeed to get similar results like you but I have a question about dataset size. When you generate molecule with free energy, there are sometimes some of molecules with 0 as free energy values. To get your results did you suppress these data and add others molecules in order to get a dataset without 0 as free energy ?
No problem! The molecules with 0 free energy were included in the dataset, although they didn't have much of an impact because we only reported the minimum of the dataset (which is always <0). Also, they were relatively uncommon, so I don't think replacing them would have had much of an impact. If you're taking averages or something from the dataset, though, I would suggest replacing the 0 free energy molecules.
I am very interested in your work on LIMO and I am currently trying to reproduce your experiments with your code. However I have a problem when the property to optimize is the binding_affinity for ESR1 the three best values are always equal to 1.0