Add automatic CPU feature detection

QuState / PhastFT

A high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure and safe Rust.

Apache License 2.0

193 stars 8 forks source link

Add automatic CPU feature detection #21

Closed calebzulawski closed 3 months ago

calebzulawski commented 3 months ago

I love this project! I joined the std::simd team originally after getting frustrated writing my fourier crate. Full disclosure--the multiversion crate is mine, but I think it works perfectly here, especially considering the desire to forbid unsafe code.

I removed all references to -Ctarget-cpu=native and the like--I believe with this change, it should make no difference, at least on x86(-64) and aarch64. Plus, in my opinion, it's only useful for research and not commercial/enterprise software, since it's not really possible to redistribute -Ctarget-cpu=native code. I think this benchmark is more faithful to real use.

This PR is going to need a follow-up commit updating the benchmark results.

smu160 commented 3 months ago

Hi @calebzulawski,

Thank you for the kind words and your contribution! I'm going to try to figure out why the coverage check is failing, and then I'll re-run the benchmarks on the same machine.

Best,
Saveliy

calebzulawski commented 3 months ago

Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?

Correct--standard aarch64 has neon, so no need to detect it. You can have nonstandard aarch64 without it, but then there is no need to detect it. SVE actually can map to portable SIMD, but it will need more support from the compiler. E.g. you can have separate SVE-128 and SVE-256 target features that each refer to a specific SVE register size, detected at runtime. I digress...

calebzulawski commented 3 months ago

I rebased--not sure if that's expected to fix the code coverage issue.

smu160 commented 3 months ago

@calebzulawski

Here are the benchmark results. Since your PR was already merged, I ran this from the main branch.

benchmarks_bar_plot_4_12 benchmarks_bar_plot_13_29

calebzulawski commented 3 months ago

Thank you! I think that looks comparable to the old benchmark results, what do you think?

smu160 commented 3 months ago

@calebzulawski

This looks great! Thank you for your contribution. This was the last thing I wanted merged prior to releasing the next version.

smu160 commented 3 months ago

@calebzulawski

Just wanted to add that I re-ran the benchmarks for pyphastft (the python bindings via PyO3) as well. All of the new plots are in the readme. Thank you!

Best, Saveliy

calebzulawski commented 3 months ago

Awesome! Excited to see a new release!