cdk / cdk-build-util

Collection of auxiliary developer utilities used by the Chemistry Development Kit.
GNU Lesser General Public License v2.1
1 stars 6 forks source link

Fingerprint definition #4

Open hkmztrk opened 6 years ago

hkmztrk commented 6 years ago

Hello,

I'd like to ask that whether there is a paper reference for Extended Fingerprint that explains how the fingerprints are decided. (https://cdk.github.io/cdk/2.0/docs/api/org/openscience/cdk/fingerprint/ExtendedFingerprinter.html)

Thanks!

johnmay commented 6 years ago

Not exactly, but it's your bog standard path based (aka Daylight fingerprint): http://www.daylight.com/dayhtml/doc/theory/theory.finger.html. It does set some extra bits for ring sizes, but I don't know if there were tested when it was originally or how much these help etc. For example you can actually capture ring while you traverse the paths, obviously if you can reach back to the start point you know it's a ring and you can hash it differently.

BTW You should only be using it for substructure searching :-).

johnmay commented 6 years ago

Will leave it open unless there is actually something to extra needed...

hkmztrk commented 6 years ago

Thank you.

BTW You should only be using it for substructure searching :-).

I was using it for ligand representation and to compute similarities.

johnmay commented 6 years ago

Use CircularFingerprint (ECFP4) it performs much better for similarity, AKA morgan fingerprint in RDKIT. Old poster but shows performance on well known benchmark Briem Lessel: https://chemaxon.com/app/uploads/2011/05/NextMovePoster3.pdf