Closed calebzulawski closed 3 months ago
Hi @calebzulawski,
Thank you for the kind words and your contribution! I'm going to try to figure out why the coverage check is failing, and then I'll re-run the benchmarks on the same machine.
Best,
Saveliy
Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?
Correct--standard aarch64 has neon, so no need to detect it. You can have nonstandard aarch64 without it, but then there is no need to detect it. SVE actually can map to portable SIMD, but it will need more support from the compiler. E.g. you can have separate SVE-128 and SVE-256 target features that each refer to a specific SVE register size, detected at runtime. I digress...
I rebased--not sure if that's expected to fix the code coverage issue.
@calebzulawski
Here are the benchmark results. Since your PR was already merged, I ran this from the main branch.
Thank you! I think that looks comparable to the old benchmark results, what do you think?
@calebzulawski
This looks great! Thank you for your contribution. This was the last thing I wanted merged prior to releasing the next version.
@calebzulawski
Just wanted to add that I re-ran the benchmarks for pyphastft
(the python bindings via PyO3) as well. All of the new plots are in the readme. Thank you!
Best, Saveliy
Awesome! Excited to see a new release!
I love this project! I joined the
std::simd
team originally after getting frustrated writing myfourier
crate. Full disclosure--themultiversion
crate is mine, but I think it works perfectly here, especially considering the desire to forbid unsafe code.I removed all references to
-Ctarget-cpu=native
and the like--I believe with this change, it should make no difference, at least on x86(-64) and aarch64. Plus, in my opinion, it's only useful for research and not commercial/enterprise software, since it's not really possible to redistribute-Ctarget-cpu=native
code. I think this benchmark is more faithful to real use.This PR is going to need a follow-up commit updating the benchmark results.