Open CLG68 opened 3 months ago
Maybe it is related to this: https://github.com/rdkit/rdkit/issues/6365 but I'm currently using the latest RDKIT so it should have been fixed.
I also get:
UFFTYPER: Unrecognized atom type: S_5+6
I screened 100000 structures from a focussed library from a Panther/ShaEP VS, on Unimol docking V2. I had a hard time with rescoring the resuts as 650 poses either had "nan" as coordinates or were out the binding pocket. So I made a script to clean the results before rescoring. Maybe this is coming from the problem I repported (UFFTYPER: Unrecognized atom type: S_5+6)? Do you know how to correct this problem?
Thanks Christian
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
It looks like there is an issue with RDKit when loading the file. Could you provide a file that produces this error? We can test it further.
Thank you v much for helping with this. I attached the target, the json file, the ref ligand used for generating the json file as well as ex of structures giving me errors or problematic results. The source-structures are extracted from my library. The generated-poses are from Unimol docking V2. The structures that give me a problem with valence do not generate a binding pose. I had to create a script to clean the docking results as the poses with no coordinates or outside of the binding pocket were creating problems with scoring in the training with Brutenib... ShaEP was just thinking forever.
The library is from the top 1% scores from a Panther/ShaEP VS. My cleaning script flagged 670 poses of around 100k minus all the poses not generated because of the valence problem.
For RDKit, I tried the version suggested on your read.me file and also the latest version. Updating to the latest version did not solve the problem.
Best, Christian Unimol-Docking-V2_clg68.zip
Hi, Was it ok in a zip archive or it would be better as individual files? Thank you, Christian
Sorry for the delayed response.
Regarding the bug in RDKit, it seems that the bug mentioned in the original issue still exists. I am using an almost up-to-date version (2024.3.1, installed via pip), but when I run the example code from the issue:
mol = Chem.MolFromSmiles("S(F)(F)(F)(F)F")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
conf = mol.GetConformer()
print(conf)
The output is:
<rdkit.Chem.rdchem.Conformer object at 0x7fc17b931b60>
[09:45:04] UFFTYPER: Unrecognized atom type: S_6+6 (0)
I also ran the example file you provided. The command I used is as follows:
python demo.py --mode single --conf-size 10 --cluster \
--input-protein Unimol-Docking-V2_clg68/MC4R_protein.pdb \
--input-ligand Unimol-Docking-V2_clg68/MC4R_ref-ligand.sdf \
--input-docking-grid Unimol-Docking-V2_clg68/docking_grid.json \
--output-ligand-name ligand_predict \
--output-ligand-dir predict_sdf \
--steric-clash-fix \
--model-dir unimol_docking_v2_240517.pt
There was no Unrecognized atom type: S_6+6 (0)
error, and the script ran as expected. Part of the output message is:
[09:55:28] Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D.
predict_sdf/ligand_predict.sdf-Cn1nnc(CC2(C3CCCCC3)CCN(C(=O)C(Cc3ccc(Cl)cc3)NC(=O)C3Cc4ccccc4CN3)CC2)n1-RMSD:4.5583
Thank you very much for running some tests with my files. Many docking poses are missing/rejected from the screen because of the "Unrecognized atom type error", of poses without coordinates and molecules docked outside the binding pocket; so I'm really interested in resolving this problem. I'll try RDKit 2024.3.1, and investigate the "is tagged as 2D" message. Hopefully it will solve the "Unrecognized atom type: S_6+6 (0)" problem.
Best, Christian
I encountered the same problem.
I still have to solve that one... I'll try the problematic files with different versions of RDKit and I'll let you know if one works better.. If not, I could always try to sanitize the problematic files.
I created a bash script to identify and remove the problematic poses/files, post-screening. Just to give you an idea for one of my screen:
Ligands_Focused-library: 61235 (input files) ... missing: 0 nan: 35 no-coordinates: 17 outside_binding-site: 394 ... Poses available: 60789 Rejected files: 446
so if I compare the number of files generated during screening to the number of files screened, the number is the same (missing=0). However, my script end up removing 446 files. To select the files, I extracted the 10th line in the sdf which should contain details about 1 atom. If this line contains "nan" instead of coordinates, the file is removed from the Poses folder, it is also the case if this atom is outside the binding pocket (+ a little buffer) as defined in the json file or if the coordinates make no sense ...molecule nowhere near the receptor (no-coordinates).
Hi,
With some molecules I get (Unimol Docking V2):
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/CHEMBL-3740791-1.sdf-Cc1ccnc(N(CCC(=O)[O-])C(=O)c2ccc3c(c2)nc(CNc2ccc(C(N)=[NH2+])cc2F)n3C)c1-RMSD:173.775 [02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0) /media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/Enamine-Z3019139935-2.sdf-Cc1cc(N2CCC(O)(C[NH+]3CCOCC3)CC2)nc(N(C)c2ccccc2)[nH+]1-RMSD:171.117 3%|█▎ | 63/1959 [01:50<50:00, 1.58s/it][02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:57] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) /media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/ChemDiv-V014-0652-1.sdf-CC(C)CCN(CC(=O)Nc1cc(C(C)(C)C)nn1-c1ccc(Cl)cc1)C(=O)C(C)(C)CCl-RMSD:173.7905 [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0) 3%|█▎ | 64/1959 [01:54<1:02:15, 1.97s/it][02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0) [02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
It does it even if I use the latest version of RDKIT.