epam / Indigo

Universal cheminformatics toolkit, utilities and database search tools
http://lifescience.opensource.epam.com
Apache License 2.0
295 stars 100 forks source link

Bingo tanimoto similarity can be over 1? #112

Open dan2097 opened 6 years ago

dan2097 commented 6 years ago

I'm using a version built from source on the 16th November 2017 against PostgreSQL 9.6.

select count(*) from bingo.chemdata where smiles @ ('0.95', '1', 'S(C1C=CC(=C(C2=NC(C3=C(C(CCC)=NN3C)N2)=O)C=1)OCC)(N1CCN(C)CC1)(=O)=O', 'Tanimoto')::bingo.sim 9 select count(*) from bingo.chemdata where smiles @ ('0.95', '1.01', 'S(C1C=CC(=C(C2=NC(C3=C(C(CCC)=NN3C)N2)=O)C=1)OCC)(N1CCN(C)CC1)(=O)=O', 'Tanimoto')::bingo.sim 14 select count(*) from bingo.chemdata where smiles @ ('0.95', '1.02', 'S(C1C=CC(=C(C2=NC(C3=C(C(CCC)=NN3C)N2)=O)C=1)OCC)(N1CCN(C)CC1)(=O)=O', 'Tanimoto')::bingo.sim 17 [the identical compound had a similarity somewhere between 1.01 and 1.02]

This behaviour did not exist in a version from ~2014.

Another minor bug is that if the upper bound on similarity is given as a null, this is interpreted as 0 rather than 1 (or perhaps a number greater than 1!) and hence the query is rejected.

MysterionRise commented 2 years ago

I know it's a bit outdated, @dan2097 could you double check it on the fresh release?

dan2097 commented 2 years ago

@MysterionRise Unfortunately not really as a few months after this we switched to using Bingo NoSQL. I think as I haven't provided sufficient information to reproduce the issue from scratch that this issue should be closed.