arduano / simdeez

easy simd
MIT License
332 stars 25 forks source link

Add avx support #25

Closed greatest-ape closed 4 years ago

greatest-ape commented 4 years ago

@jackmott In the sleef functions, you load sse-width vectors from the memory addresses of avx vectors. I used halve intrinsics for that here. Do you happen to have tried both and know which is more performant?

jackmott commented 4 years ago

@greatest-ape I have not tried both, we should do that!

do you want to go ahead and merge this or have you got more changes coming?

greatest-ape commented 4 years ago

@jackmott I added some more tests. In the last commit, some of the new tests seem to reveal bugs in the sse2 round/ceil/floor implementations. I don't know if it makes sense to merge them as a part of this request (I think it probably does though). Anyway, I'm not planning on adding any more commits at the moment, so please feel free to merge this.

greatest-ape commented 4 years ago

Woops, I needed to add one more (tiny) commit :D

greatest-ape commented 4 years ago

Regarding the most performant way to run functions on simd halves, packed_simd uses this method: https://github.com/rust-lang/packed_simd/blob/60ca4eff21cb9cecad17b28a98f87b0b9148e563/src/codegen/math/float/macros.rs#L52 (a union)