epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
291 stars 100 forks source link

pka - Lee Crippen - issue smarts ~[i]~[i]~A #869

Open achher opened 1 year ago

achher commented 1 year ago

Current Lee Crippen pka calculator implementation in Indigo 1.8.x doesn't return the pka values published at the original publication. In some cases the published smarts queries doesn't give the same results than in Indigo. For exapmple the original example from the article on 4-(benzyloxy)benzoic acid returns a pka value of 4.47 but Indigo calculates 3.19. When following the decission tree which leads to the result one can find that Indigo handles smarts queries which includes patterns like ~[i]~[i]~Anot correctly. As example the query [O][i]~[i]~[i]~[i]~[i]~[i]~[i]~A gives a match on 4-(benzyloxy)benzoic acid but it should not match. And the query [OH][i](=O)[i]~[i]~[i]~[i]~[i]-A gives a match on 4-(benzyloxy)benzoic acid but it should not match. In both cases it matches the aromatic system and ignores the A (any aliphatic atom) or does not match the A on the right position:

2022-10-10_15-14-32

As a lot of the queries of the pka calculator are dependent on smarts queries which includes the pattern ~[i]~[i]~A most of the predicted values are not correct.

Can you add the correct support of smart queries of pattern ~[i]~[i]~A ? I think this would incerease the prediction quality of the pka calculator a lot.

AATDev21 commented 1 year ago

Hi @achher, Thank you for reporting this issue. We'll fix it in the future Indigo versions.