eloyfelix / RDKitMinimalLib.jl

RDKitMinimalLib wrapper for the Julia programming language
MIT License
25 stars 3 forks source link

MACCS fingerprint #23

Closed SimonEnsemble closed 2 years ago

SimonEnsemble commented 2 years ago

thanks for this package!

is it possible to implement MACCS fingerprints using this minimal RDKit binary? if not, is a PR welcome to compute MACCS fingerprints, maccs_fp(mol::Mol) using:

  1. this list of SMARTS patterns
  2. get_qmol, get_mol, and get_substruct_match/get_substruct_matches?
eloyfelix commented 2 years ago

thank you for using it and reporting!

MACCS fingerprints are not implemented in the minimal lib so I can't wrap the functionality.

https://github.com/rdkit/rdkit/blob/master/Code/MinimalLib/cffiwrapper.h https://github.com/rdkit/rdkit/blob/master/Code/MinimalLib/cffiwrapper.cpp

However it seems it should be possible to do and it is preferred than reimplement it in here. I'll take a look at it when I'm back from holidays.

Just out of curiosity, how are you using MACCS fingerprints? They where originally designed to do quick fingerprint based substructure screenings. RDKit implements this other fingerprint type that can be used as a MACCS replacement and it is wrapped in the Julia package

https://www.rdkit.org/docs/RDKit_Book.html#pattern-fingerprints https://eloyfelix.github.io/RDKitMinimalLib.jl/dev/calculators/#RDKitMinimalLib.get_pattern_fp

SimonEnsemble commented 2 years ago

makes sense to not re-implement MACCS fingerprint here if they can be more directly called from RDKit. 😄

wonder if the folks developing the MinimalLib are responsive to requests. should I try there, or are you saying you know how to modify MinimalLib and are willing to possibly make a PR on that for MACCS?

we used MACCS FP's to represent molecules as feature vectors for machine learning here. I will use your package for my data science class to do some molecular machine learning this quarter!

interesting, not too clear to me what these "Pattern Fingerprints" are/ if they are more information-rich than MACCS fingerprints for machine learning.

thanks!

eloyfelix commented 2 years ago

RDKit community is very nice so it is indeed worth to mention it there. I may also have some time to give it a go in a week. I should be able to send a PR.

The most common FPs I see being used in ML are the ECFP like ones (like Morgan FP, implemented in this package). Now people is also using graph representations a lot as well.

This paper from the RDKit creator has a benchmark for fingeprint used in similarity searches, which is not the same than ML but I think it is still informative to see the amount of information that different FP types carry

image: https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-5-26/figures/4

there is possibly papers benchmarking FP types for ML but I don't know any from the top of my head

eloyfelix commented 2 years ago

Took a while but this is my attempt to bring MACCS to the MinimalLib: https://github.com/rdkit/rdkit/pull/5707 Hope it can be merged :)

eloyfelix commented 2 years ago

now merged but it won't be included until RDKit's 2023.03.01 release Yesterday I released a new version that implements some new FP types in case it helps! get_atom_pair_fp get_topological_torsion_fp

SimonEnsemble commented 2 years ago

awesome, thank you! CC @eahenle