Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
Most of the numerical packages don't have well defined policies for reporting numerical errors, especially in Python. SciPy and NumPy and the standard math library report different error types in identical cases. Even the hardware vendors don't provide documentation for the error bounds of their most commonly used instructions, so those have to be reverse-engineered.
It would be great if SimSIMD could be more consistent in that regard, and use signaling NaN floating-point numbers to report overflow. In that case, the *_accurate functions can be subsequently called on the same inputs avoiding overflows in most cases.
The issue is further complicated by the lack of saturated arithmetic intrinsics for many of the common types, like the lack of 32-bit and 64-bit saturated addition in AVX-512. This is a pretty big refactoring undertaking, that I can definitely initiate with a better testing suite, but would love to see more participation from the community.
Can you contribute to the implementation?
[X] I can contribute
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
[X] I have searched the existing issues
Code of Conduct
[X] I agree to follow this project's Code of Conduct
Describe what you are looking for
Most of the numerical packages don't have well defined policies for reporting numerical errors, especially in Python. SciPy and NumPy and the standard
math
library report different error types in identical cases. Even the hardware vendors don't provide documentation for the error bounds of their most commonly used instructions, so those have to be reverse-engineered.It would be great if SimSIMD could be more consistent in that regard, and use signaling
NaN
floating-point numbers to report overflow. In that case, the*_accurate
functions can be subsequently called on the same inputs avoiding overflows in most cases.The issue is further complicated by the lack of saturated arithmetic intrinsics for many of the common types, like the lack of 32-bit and 64-bit saturated addition in AVX-512. This is a pretty big refactoring undertaking, that I can definitely initiate with a better testing suite, but would love to see more participation from the community.
Can you contribute to the implementation?
Is your feature request specific to a certain interface?
It applies to everything
Contact Details
No response
Is there an existing issue for this?
Code of Conduct