Improve: Use the neon dot product intrinsic for `dot_bf16_neon`

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

https://ashvardanian.com/posts/simsimd-faster-scipy/

Apache License 2.0

988 stars 59 forks source link

Improve: Use the neon dot product intrinsic for `dot_bf16_neon` #169

Closed MarkReedZ closed 2 months ago

MarkReedZ commented 2 months ago

PR for https://github.com/ashvardanian/SimSIMD/issues/167

simsimd_dot_bf16_neon has been updated to use the neon dot product intrinsic vbfdotq_f32 for a 25% speedup. I have only tested this on AWS t4g and c7g instances.

Also fixed a couple typos. simsimd_dot_i8_serial was used instead of _neon, the bf16_neon declarations were missing, and a trailing // was seen on line 526.

ashvardanian commented 2 months ago

Very nice, thank you @MarkReedZ! Merging soon!