ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
988 stars 59 forks source link

Element-wise BLAS APIs & new Tensor for Python: ⬆️ 450 kernels #220

Open ashvardanian opened 3 weeks ago

ashvardanian commented 3 weeks ago

It started as a straightforward optimization request from the @albumentations-team: to improve the special case of the wsum (Weighted Sum) operation for the "non-weighted" scenario and to add APIs for scalar multiplication and addition. This update introduces new public APIs in both C and Python:

  1. scale: Implements $\alpha * A_i + \beta$
  2. sum: Computes $A_i + B_i$

Recognizing the value of consistency with widely-used libraries, we’ve also added "aliases" aligned with names familiar to developers using NumPy and OpenCV for element-wise addition and multiplication across vectors and scalars:

NumPy OpenCV SimSIMD
np.add cv.add simd.add
np.multiply cv.multiply simd.multiply

Note: SimSIMD and NumPy differ in handling certain corner cases. SimSIMD offers broader support, with up to 64 tensor dimensions (compared to NumPy’s 32), wider compatibility with Python versions, operating systems, hardware, and numeric types—and of course, greater speed! However, SimSIMD requires input vectors to be of identical types. For integers, it also supports saturation to prevent overflow/underflow, which can simplify debugging but may be unexpected for some developers.

The real excitement came when we realized that larger projects would take time to adopt emerging numeric types like bfloat16 and float8, which are well-known in AI circles. To bridge this gap, SimSIMD now introduces an AnyTensor type designed for maximum interoperability via CPython's Buffer Protocol and beyond, setting it apart from similar types in NumPy, PyTorch, TensorFlow, and JAX.

Tensor Class for C, Python, and Rust 🦀

Element-wise Operations 🧮

Geospatial Operations 🛰️


If you have any feedback regarding the limitations of current array-processing software in a single- or multi-node AI training settings, I am all ears 👂