arduano / simdeez

easy simd
MIT License
330 stars 25 forks source link

AVX support #24

Open greatest-ape opened 4 years ago

greatest-ape commented 4 years ago

First of all, thanks for a great library!

Support for plain avx would be great for me. I forked the repo (to https://github.com/greatest-ape/simdeez/tree/avx), copied all the code from the avx2 implementation to a new avx module and replaced all method implementations using avx2-only instructions with unimplemented!().

Enough methods work for me to successfully use it in my synthesizer https://github.com/greatest-ape/OctaSine. I'm seeing large speedups over sse41 on a computer which doesn't support avx2.

The next step would be implementing all the missing methods. Likely, a lot of code could be abridged from the sse41 version.

Would this be interesting to anybody else?

jackmott commented 4 years ago

I actually did not know that some CPUs have AVX but not AVX2. Ideally we can share all the AVX code in the AVX2 trait as well somehow, since all the floating point ops will be the same.

Happy to help and merge this. Best way to fill in the unimplemented! methods is probably doing two SSE4 operations.

greatest-ape commented 4 years ago

I created a pull request: #25 :)

jackmott commented 4 years ago

I've merged your commit, I think it would be worth doing some benchmarks to figure out for sure what is the best way to combine two SSE operations. It may all compile down to the same thing but only one way to be sure! I'll see if I can set something up.

jackmott commented 4 years ago

@greatest-ape I added a benches folder with a start on benchmarking the different ways to do two sse operations to get an avx result

feel free to add to it

greatest-ape commented 4 years ago

Alright, great. I'll do it if I have time.