Open greatest-ape opened 4 years ago
I actually did not know that some CPUs have AVX but not AVX2. Ideally we can share all the AVX code in the AVX2 trait as well somehow, since all the floating point ops will be the same.
Happy to help and merge this. Best way to fill in the unimplemented! methods is probably doing two SSE4 operations.
I created a pull request: #25 :)
I've merged your commit, I think it would be worth doing some benchmarks to figure out for sure what is the best way to combine two SSE operations. It may all compile down to the same thing but only one way to be sure! I'll see if I can set something up.
@greatest-ape I added a benches folder with a start on benchmarking the different ways to do two sse operations to get an avx result
feel free to add to it
Alright, great. I'll do it if I have time.
First of all, thanks for a great library!
Support for plain avx would be great for me. I forked the repo (to https://github.com/greatest-ape/simdeez/tree/avx), copied all the code from the avx2 implementation to a new avx module and replaced all method implementations using avx2-only instructions with unimplemented!().
Enough methods work for me to successfully use it in my synthesizer https://github.com/greatest-ape/OctaSine. I'm seeing large speedups over sse41 on a computer which doesn't support avx2.
The next step would be implementing all the missing methods. Likely, a lot of code could be abridged from the sse41 version.
Would this be interesting to anybody else?