ashvardanian / usearch-molecules

Searching for structural similarities across billions of molecules in milliseconds
https://ashvardanian.com/posts/usearch-molecules
Apache License 2.0
48 stars 4 forks source link

A useful library - `scikit-fingerprints` #8

Closed j-adamczyk closed 2 weeks ago

j-adamczyk commented 2 weeks ago

Hi, thanks for an excellent project!

Computing molecular fingerprints with RDKit at scale is challenging, so we made scikit-fingerprints, a scikit-learn compatible library: https://github.com/scikit-fingerprints/scikit-fingerprints. It basically wraps RDKit with Joblib, enabling easy parallel computation of fingerprints. It also works in a distributed environment, just use Dask as Joblib backend. This may be useful if you want to add any other fingerprints to USearch Molecules.

Paper with benchmarks, speedup plots etc. has been published in SoftwareX: https://doi.org/10.1016/j.softx.2024.101944.

ashvardanian commented 2 weeks ago

Hi, @j-adamczyk! Thanks for the link!

For performance-critical code we generally avoid wrappers, including the native one, and go directly to the C++ original or reimplement everything in Assembly 🤔