gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
1.07k stars 258 forks source link

Failed to run diffdock #77

Closed dinghezier closed 1 year ago

dinghezier commented 1 year ago

Hello, I got the error while running the code python -m inference --protein_ligand_csv data/testset_csv.csv --out_dir results/user_predictions_testset --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise.

139it [11:44:06, 32.48s/it)rdkit coords could not be generated without using random coords. using random coords now, rdkit coords could not be generated without using random coords. using random coords now. Failed on ['complex 139] linalg.svd: The algorithm failed to converge because the input matrix contained non-finite values. 139it [1:44:3745.16s/it]raceback (most recent call last): File "/home/dj/anaconda3/envs/diffdock/lib/python3.8/runpy.py", line 194, in run module as mainreturn run code( code, main globals, None, File "/home/dj/anaconda3/envs/diffdock/lib/python3.8/runpy.py", line 87, in _run _codeexec( code,run alobals File "/home/dj/diffdock/DiffDock/inference.py", line 188,in <module>raise e/inference.py, line 151, in <module> File "/home/di/diffdock/DiffDock/data list, confidence = sampling( data list=data list, model=model,File "/home/di/diffdock/DiffDock/utils/sampling.py",line 86,in samplingnew data list.extend([modifyconformer(complex graph, tr_perturb[i:i + 1], rot perturb[i:i + 1].squeeze(0). File "/home/dj/diffdock/DiffDock/utils/sampling.py", line 86,in <listcomp>new data list.extend([modify conformer(complex graph, tr perturb[i:i + 1], rot perturb[i:i + 1].squeeze(0), File "/home/dj/diffdock/DiffDock/utils/diffusion utils.py", line 29, in modify_conformerR, t = rigid transform Kabsch 3D torch(flexible new pos.T, rigid new pos.T) File"/home/dj/diffdock/DiffDock/utils/geometry.py", line 112, in rigid transform Kabsch 3D torchu.6Vt = torch .linala.svd(HLinAlgError: linalg.svd:torchThe algorithmfailed to converge because the input matrix contained non-finite values

Could someone tell me how to slove the problem?Thank you.

gcorso commented 1 year ago

Hi @dinghezier Unfortunately, sometimes the routine of torch we use for SVD fails to converge on certain inputs, does the problem persists if you run it multiple times? What complexes are you running the model on?

dinghezier commented 1 year ago

Hello, Yes, I tried to run it more times and sometimes it worked, but it was a little inconvenient to run large data sets because of the frequent interruptions. Eg. 1jla,1t46,1t9b,qlbk,and so on. And I also got any other error. 1.skipping 1a30 because of the error:'ascii' codec can't decode byte Oxef in position 0: ordinal not in range(128) 2.lM embeddings for complex 1hwi did not have the right length for the protein. Skipping 1hwi.The test dataset did not contain hwi for (CICIn1cIC=CCO)CCIO)CCI=D)0)c-c2cccF)cc2c2ccccc21 and /home/dj/diffdock/DiffHAPPENINGDock/data/140protein/1hwi_protein.pdb. We are skipping this complex. Can you tell me how to solve problems?Thank you very much.

kucukben commented 1 year ago

Hello - we see the same SVD error with the test set as well.

It doesn't look like a problem with the stability of the SVD algorithm, but an issue with the input of the SVD, the cross-covariance matrix A@B_T in the Kabsch algorithm. It looks like the sampled ligand pose may be returned as NaN so the optimal rotation from the input ligand pose cannot be calculated. Can you please confirm that the pose sent to Kabsch algorithm is always a legitimate one?

The issue does not persist always at the same ligand or protein, resampling it enough sometimes gives legitimate poses to continue with, but it is not rare either, we cannot run the test set without encountering this error at least once. Thanks;

son-h-pham commented 1 year ago

Hello, I do not have the logs on hand, but I used version fff8f0b for a few docks, and I can recall similar errors. However, the behavior previously was to skip the protein-ligand pair and move on to the next one, while the latest branch aborts the run entirely. The error occurs in <1% of ligands in my experience, so for the moment restoring the previous behavior would be of great help. Thanks

Glinttsd commented 1 year ago

Hi,I also encountered these errors. I wonder what exact previous version should I restore?

son-h-pham commented 1 year ago

Hi @Glinttsd I used branch fff8f0b, and also removed the diffdock conda environment and reinstalled it along with the dependencies.

gcorso commented 1 year ago

I've removed the raise in the catch of the exception. The inference should now run across all the files skipping the ones it fails for (printing the error). To avoid the SVG issue causing complexes to be skipped, one can easily add a for loop that retries the same complexes a few times when it fails before moving to the next one. I hope this helps!