Open twall opened 3 years ago
There may also be additional discrepancies depending on whether the starting molecule was created via SMILES or InChI:
source CC[C@@]1(C[C@H](C1)[C@](O)(c1ccc(Cl)[n]c1Cl)c1cc([n]c(N[C@@H](C)C(F)(F)F)[n]1)C(F)(F)F)NS
source InChI InChI=1S/C20H21Cl2F6N5OS/c1-3-17(33-35)7-10(8-17)18(34,11-4-5-14(21)32-15(11)22)12-6-13(20(26,27)28)31-16(30-12)29-9(2)19(23,24)25/h4-6,9-10,33-35H,3,7-8H2,1-2H3,(H,29,30,31)/t9-,10-,17+,18-/m0/s1
target C[C@@H](Nc1nc(N[C@H](C)C(F)(F)F)nc(-c2cccc(Cl)n2)n1)C(F)(F)F
target InChI InChI=1S/C14H13ClF6N6/c1-6(13(16,17)18)22-11-25-10(8-4-3-5-9(15)24-8)26-12(27-11)23-7(2)14(19,20)21/h3-7H,1-2H3,(H2,22,23,25,26,27)/t6-,7-/m1/s1
score input/bingodb encoding
0.30 inchi/inchi
0.27 smiles/smiles
0.14 smiles/inchi
0.12 inchi/smiles
And when a molecule C[C@@H](Nc1nc(N[C@H](C)C(F)(F)F)nc(-c2cccc(Cl)n2)n1)C(F)(F)F
is compared against itself:
1.00 smiles/smiles
1.00 inchi/inchi
0.33 inchi/smiles
0.33 smiles/inchi
I've got an input SMILES string for which bingo produces different similarity results depending on whether the indexed molecule is provided as inchi or SMILES. I need to have the similarity results be consistent regardless of the initial encoding of the indexed molecule.
Input:
Indexed molecule (as SMILES)
Indexed molecule (as InChI)
Note that indigo will report the two molecules as equal if the objects are created from SMILES b and c.
The similarity for the smiles-based molecule is about 0.56. The similarity for the inchi-based molecule is about 0.38.
I have several other molecules for which the result is significantly different whether the molecule is created from inchi or from SMILES.
Indigo options: