epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
315 stars 105 forks source link

Bingo Exact Search for PostgreSQL matches unwanted molecules #742

Open AATDev21 opened 2 years ago

AATDev21 commented 2 years ago

Bingo Exact Search with no flags, which is equal to the ALL flag - the most restrictive one, matches the molecules, which are not in fact exactly similar to the query.


Example I

image

All molecules were taken from out.sdf for query and mols.sdf for target.

Bingo Exact mathes both mol_121 and mol_122 for query q_21. Indigo Exact matches only mol_121.

>>> q_21_smi = i.loadMolecule("C1(NC2C(=CC(=C(C)C=2)Cl)N=1)C1=NNC=C1NC(=O)C1CC1")

>>> mol_121_smi = i.loadMolecule("C1(NC2C(=CC(=C(C)C=2)Cl)N=1)C1=NNC=C1NC(=O)C1CC1")

>>> mol_122_smi = i.loadMolecule("C1(NC2C(=CC(=C(Cl)C=2)C)N=1)C1=NNC=C1NC(=O)C1CC1")

>>> i.exactMatch(q_21_smi, mol_121_smi)
<indigo.IndigoObject object at 0x104b17250>

>>>  i.exactMatch(q_21_smi, mol_122_smi)

>>>

Expected behavior: Bingo Exact should repeat Indigo Exact behavior and match only mol_121.


Example II

image

Bingo Exact mathes both mol_123 and mol_124 for query q_23. Indigo Exact matches only mol_123.

>>> q_23_smi = i.loadMolecule("C1(NC2C(=CC(=CC=2)C(OC)=O)N=1)C1C2C=CC=CC=2NN=1")

>>> mol_123_smi = i.loadMolecule("C1(NC2C(=CC(=CC=2)C(OC)=O)N=1)C1C2C=CC=CC=2NN=1")

>>> mol_124_smi = i.loadMolecule("C1(NC2C(=CC=C(C=2)C(OC)=O)N=1)C1C2C=CC=CC=2NN=1")

>>> i.exactMatch(q_23_smi, mol_123_smi)
<indigo.IndigoObject object at 0x10449af80>

>>> i.exactMatch(q_23_smi, mol_124_smi)

>>> 

Expected behavior: Same as above: Bingo Exact should repeat Indigo Exact behavior and match only mol_123.

Note: std.json for Bingo test_exact should be checked for incorrect results as well.

boglet commented 1 year ago

These are tautomeric representations of the same molecule. Personally, I would expect the system to consider mol_123 and mol_124 as the same. Likewise for mol_121 and mol_122. I know that there is more fine control about aromaticity perception in Indigo so that could explain why Indigo is not matching both molecules.