Open CLG68 opened 1 month ago
Maybe it is related to this: https://github.com/rdkit/rdkit/issues/6365 but I'm currently using the latest RDKIT so it should have been fixed.
I also get:
UFFTYPER: Unrecognized atom type: S_5+6
I screened 100000 structures from a focussed library from a Panther/ShaEP VS, on Unimol docking V2. I had a hard time with rescoring the resuts as 650 poses either had "nan" as coordinates or were out the binding pocket. So I made a script to clean the results before rescoring. Maybe this is coming from the problem I repported (UFFTYPER: Unrecognized atom type: S_5+6)? Do you know how to correct this problem?
Thanks Christian
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
It looks like there is an issue with RDKit when loading the file. Could you provide a file that produces this error? We can test it further.
Thank you v much for helping with this. I attached the target, the json file, the ref ligand used for generating the json file as well as ex of structures giving me errors or problematic results. The source-structures are extracted from my library. The generated-poses are from Unimol docking V2. The structures that give me a problem with valence do not generate a binding pose. I had to create a script to clean the docking results as the poses with no coordinates or outside of the binding pocket were creating problems with scoring in the training with Brutenib... ShaEP was just thinking forever.
The library is from the top 1% scores from a Panther/ShaEP VS. My cleaning script flagged 670 poses of around 100k minus all the poses not generated because of the valence problem.
For RDKit, I tried the version suggested on your read.me file and also the latest version. Updating to the latest version did not solve the problem.
Best, Christian Unimol-Docking-V2_clg68.zip
Hi, Was it ok in a zip archive or it would be better as individual files? Thank you, Christian
Sorry for the delayed response.
Regarding the bug in RDKit, it seems that the bug mentioned in the original issue still exists. I am using an almost up-to-date version (2024.3.1, installed via pip), but when I run the example code from the issue:
mol = Chem.MolFromSmiles("S(F)(F)(F)(F)F")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
conf = mol.GetConformer()
print(conf)
The output is:
<rdkit.Chem.rdchem.Conformer object at 0x7fc17b931b60>
[09:45:04] UFFTYPER: Unrecognized atom type: S_6+6 (0)
I also ran the example file you provided. The command I used is as follows:
python demo.py --mode single --conf-size 10 --cluster \
--input-protein Unimol-Docking-V2_clg68/MC4R_protein.pdb \
--input-ligand Unimol-Docking-V2_clg68/MC4R_ref-ligand.sdf \
--input-docking-grid Unimol-Docking-V2_clg68/docking_grid.json \
--output-ligand-name ligand_predict \
--output-ligand-dir predict_sdf \
--steric-clash-fix \
--model-dir unimol_docking_v2_240517.pt
There was no Unrecognized atom type: S_6+6 (0)
error, and the script ran as expected. Part of the output message is:
[09:55:28] Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D.
predict_sdf/ligand_predict.sdf-Cn1nnc(CC2(C3CCCCC3)CCN(C(=O)C(Cc3ccc(Cl)cc3)NC(=O)C3Cc4ccccc4CN3)CC2)n1-RMSD:4.5583
Thank you very much for running some tests with my files. Many docking poses are missing/rejected from the screen because of the "Unrecognized atom type error", of poses without coordinates and molecules docked outside the binding pocket; so I'm really interested in resolving this problem. I'll try RDKit 2024.3.1, and investigate the "is tagged as 2D" message. Hopefully it will solve the "Unrecognized atom type: S_6+6 (0)" problem.
Best, Christian
Hi,
With some molecules I get (Unimol Docking V2):
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/CHEMBL-3740791-1.sdf-Cc1ccnc(N(CCC(=O)[O-])C(=O)c2ccc3c(c2)nc(CNc2ccc(C(N)=[NH2+])cc2F)n3C)c1-RMSD:173.775 [02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0) /media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/Enamine-Z3019139935-2.sdf-Cc1cc(N2CCC(O)(C[NH+]3CCOCC3)CC2)nc(N(C)c2ccccc2)[nH+]1-RMSD:171.117 3%|█▎ | 63/1959 [01:50<50:00, 1.58s/it][02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:57] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) /media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/ChemDiv-V014-0652-1.sdf-CC(C)CCN(CC(=O)Nc1cc(C(C)(C)C)nn1-c1ccc(Cl)cc1)C(=O)C(C)(C)CCl-RMSD:173.7905 [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) 3%|█▎ | 64/1959 [01:54<1:02:15, 1.97s/it][02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
It does it even if I use the latest version of RDKIT.