linalg.svd: The algorithm failed to converge because the input matrix contained non-finite values.

marissadolorfino commented 1 year ago

I am trying to run DiffDock on a batch of pdbs with the same ligand. For some of the pdbs I get this error, which results in no output for the given pdb.

However, this error seems to be random and is not reproduced each time I run the same batch. (e.g. pdb1 will fail while pdb2 will run one submission, while pdb1 will run and pdb2 fail with fail in a different submission.

Is there a workaround for this? Thanks!

decortja commented 1 year ago

I have the same error screening a library of molecules against the same protein. Did you ever figure out a reason? Or can anyone help?

My inference parameters are: --inference_steps 20 --samples_per_complex 20 --batch_size 10 --actual_steps 18 --no_final_step_noise

marissadolorfino commented 1 year ago

I found that when I use smaller batch sizes the error occurs less across all of the batches, but I am not sure why or what is causing the error.

rcrehuet commented 8 months ago

I've tracked down the problem. The problem arises when modify_conformer gets a very large value in tr_perturb. This generates incorrect geometries that either give an error first in the division in rot_vec calculation in modify_conformer_torsion_angles or in the SVD in rigid_transform_Kabsch_3D_torch.

This large value of tr_perturb is generated when tr_score is also very large. As this tr_score is part of the model return arguments, I am not sure how to evaluate it. I have made a patch in sampling function to scale down tr_perturb and created a pull request. But I would feel more confident if @gcorso first tells us if scaling down very large tr_perturb values makes sense within the model.

gcorso / DiffDock

linalg.svd: The algorithm failed to converge because the input matrix contained non-finite values. #110