Computing molecular fingerprints with RDKit at scale is challenging, so we made scikit-fingerprints, a scikit-learn compatible library: https://github.com/scikit-fingerprints/scikit-fingerprints. It basically wraps RDKit with Joblib, enabling easy parallel computation of fingerprints. It also works in a distributed environment, just use Dask as Joblib backend. This may be useful if you want to add any other fingerprints to USearch Molecules.
For performance-critical code we generally avoid wrappers, including the native one, and go directly to the C++ original or reimplement everything in Assembly 🤔
Hi, thanks for an excellent project!
Computing molecular fingerprints with RDKit at scale is challenging, so we made
scikit-fingerprints
, a scikit-learn compatible library: https://github.com/scikit-fingerprints/scikit-fingerprints. It basically wraps RDKit with Joblib, enabling easy parallel computation of fingerprints. It also works in a distributed environment, just use Dask as Joblib backend. This may be useful if you want to add any other fingerprints to USearch Molecules.Paper with benchmarks, speedup plots etc. has been published in SoftwareX: https://doi.org/10.1016/j.softx.2024.101944.