Closed MarkReedZ closed 2 months ago
I'm not sure this code is correct - how do we typically test the results?
@MarkReedZ, C++ benchmarks will log the accuracy delta compared to serial baseline. Python tests will fail if this instruction does something weird. Let's run those two.
Does it make sense to use
vbfdotq_32
?
@MarkReedZ, I'm not sure if I've used that instruction before. If it doesn't affect compilation settings and CPU-capability requirements, sure! Otherwise, we can add a note everywhere vbfmlaltq_f32
is used. But for #163 it probably still makes sense.
PR here: https://github.com/ashvardanian/SimSIMD/pull/169
Run on AWS t4g, and c7g instances with gcc 12/13 only.
The following code runs in 11ns vs 15ns for the current version using
vbfmlaltq_f32
. Does it make sense to usevbfdotq_32
? I'm not sure this code is correct - how do we typically test the results?Should we add an unaligned vector size to the benchmarks?