ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
https://ashvardanian.com/posts/simsimd-faster-scipy/
Apache License 2.0
999 stars 59 forks source link

Need clarity #97

Closed heflinstephenraj-sa-14411 closed 8 months ago

heflinstephenraj-sa-14411 commented 8 months ago

In response to your blog, which AVX have you used to achieve that 118ns for floats? I ran the same experiment and obtained 7.30 ns for Time and 350 ns for CPU time in the Google benchmark report for avx2_f32_cos_1536d. Could you please clarify which value I should consider, and whether you used time or CPU time for 118 ns? @ashvardanian

ashvardanian commented 8 months ago

If you mean this section, then I've used AVX-512 as described in the post.

AVX2 code can only process 8 floats at a time, so for 1536 dimensions it needs at least 192 iterations. Even if each is only 0.3ns (1 CPU cycle on a 3GHz core), it would take at least 64ns. In reality, each iteration takes several cycles.

Does that answer your question?