Closed SimonEnsemble closed 2 years ago
thank you for using it and reporting!
MACCS fingerprints are not implemented in the minimal lib so I can't wrap the functionality.
https://github.com/rdkit/rdkit/blob/master/Code/MinimalLib/cffiwrapper.h https://github.com/rdkit/rdkit/blob/master/Code/MinimalLib/cffiwrapper.cpp
However it seems it should be possible to do and it is preferred than reimplement it in here. I'll take a look at it when I'm back from holidays.
Just out of curiosity, how are you using MACCS fingerprints? They where originally designed to do quick fingerprint based substructure screenings. RDKit implements this other fingerprint type that can be used as a MACCS replacement and it is wrapped in the Julia package
https://www.rdkit.org/docs/RDKit_Book.html#pattern-fingerprints https://eloyfelix.github.io/RDKitMinimalLib.jl/dev/calculators/#RDKitMinimalLib.get_pattern_fp
makes sense to not re-implement MACCS fingerprint here if they can be more directly called from RDKit. 😄
wonder if the folks developing the MinimalLib are responsive to requests. should I try there, or are you saying you know how to modify MinimalLib and are willing to possibly make a PR on that for MACCS?
we used MACCS FP's to represent molecules as feature vectors for machine learning here. I will use your package for my data science class to do some molecular machine learning this quarter!
interesting, not too clear to me what these "Pattern Fingerprints" are/ if they are more information-rich than MACCS fingerprints for machine learning.
thanks!
RDKit community is very nice so it is indeed worth to mention it there. I may also have some time to give it a go in a week. I should be able to send a PR.
The most common FPs I see being used in ML are the ECFP like ones (like Morgan FP, implemented in this package). Now people is also using graph representations a lot as well.
This paper from the RDKit creator has a benchmark for fingeprint used in similarity searches, which is not the same than ML but I think it is still informative to see the amount of information that different FP types carry
image: https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-5-26/figures/4
there is possibly papers benchmarking FP types for ML but I don't know any from the top of my head
Took a while but this is my attempt to bring MACCS to the MinimalLib: https://github.com/rdkit/rdkit/pull/5707 Hope it can be merged :)
now merged but it won't be included until RDKit's 2023.03.01 release
Yesterday I released a new version that implements some new FP types in case it helps!
get_atom_pair_fp
get_topological_torsion_fp
awesome, thank you! CC @eahenle
thanks for this package!
is it possible to implement MACCS fingerprints using this minimal RDKit binary? if not, is a PR welcome to compute MACCS fingerprints,
maccs_fp(mol::Mol)
using:get_qmol
,get_mol
, andget_substruct_match
/get_substruct_matches
?