Open smg3d opened 23 hours ago
Thanks for the report. This happens if rdkit fails to generate a conformer for some random seeds, and there is no fallback idealised coordinates given in the ccd cif defining the ligand input. You can work around this by adding idealised coordinates.
When there are no conformer coordinates, we cannot generate frames for PAE and without a frame we give up on generating a confidence. However that is behavior we could change - we had single-atom ions in mind for that case (where there were no frames in training either), full ligands should be fine at inference time, as the frames aren't actually used at inference time. But perhaps given there are no reference coordinates, its better to have nans here, so that users are aware by looking at the output that something is different in these cases (likely not as good a prediction).
Input is one protein +
N
copies of the same ligand.Depending on the value of
N
(40, 50, 60, 80, 100, ..., 200), I get between 1 and 6 rdkit warning during "constructing SMILES reference structure". The warning message is :also, if I get one rdkit warning, I also get the following (the number of lines = number of atoms in the ligand).
The structure inference proceed without warning / error, and the ligand with rdkit warning have coordinates
However, all metrics related to that ligand are
null
in summary_confidences.json:The number of problematic ligands varies between runs with different ligands, and sometimes between different seeds within the same run, eg:
For 30+ runs with N >= 40 : they all get at least one warning (with associated
null
metrics.) For all runs with N<= 30: no rdkit warningThe structure of the problematic ligand appears normal.
If it wasn't for the
null
metrics associated to that ligand, I would not worry. Maybe all is fine, and it might just be a problem with the metrics computation routine if there is somehow something "wrong" with that ligand at the start (i.e.Found identical coordinates: Assigning as colinear.
).