AdamNiederer / faster

SIMD for humans
Mozilla Public License 2.0
1.56k stars 51 forks source link

Status of AVX 512 ? #65

Open ManuelCostanzo opened 4 years ago

ManuelCostanzo commented 4 years ago

Hello, I want to check if this crate works with avx 512 instructions. And also if they recommend to use it, due to the inactivity of it.

Are there other crates ? This one seems to me to be very good.

Thanks

AdamNiederer commented 4 years ago

Hey,

No AVX-512 instructions yet (I think my underlying SIMD bindings still don't support it, and even if they did I don't have hardware to test it on), but this crate should compile and work on the latest rust.

ManuelCostanzo commented 4 years ago

Hi ! Thanks for answer.

I would like to ask you what are the advantages of explicitly writing vectorization instructions versus compiler auto-vectorization ? I understand that in complex cases that the compiler does not realize this can be useful, but if I have some simple program, using this type of crates improves ?

AdamNiederer commented 4 years ago

Always benchmark your programs both before and after optimization. Explicit SIMD is often faster, sometimes the same, and sometimes quite a bit slower than what the compiler can come up with. You're going up against hundreds of Ph.Ds with access to detailed documentation on the processor you're compiling for, so you're going to whiff sometimes.

LLVM can handle a lot of situations well, but if you're doing anything more complex than simple mapping/reduction, explicit SIMD will help you hit your performance targets more consistently. It's also guaranteed to stay that way - LLVM takes performance seriously and fixes regressions that do come up, but it can be hard to detect when a program is running 5% slower because autovectorization stopped working in some subroutine.