arduano / simdeez

easy simd
MIT License
331 stars 25 forks source link

mullo_epi32 and impl Mul are different #12

Closed remifontan closed 4 years ago

remifontan commented 5 years ago

I'm debugging some inconsistency between multiplying 2 i32x8 using the overload impl Mul and manually calling S::mullo_epi32(...).

looking at the source code, mullo_epi32 seems to be using _mm256_mullo_epi32, while the impl Mul seems to be using _mm256_mul_epi32.

In my case, the mullo seems to be computing the proper result...

This is with avx2, I haven't checked the other implementation.

jackmott commented 5 years ago

Thank you for the heads up on this, I will look into it tonight probably.

jackmott commented 5 years ago

I've put it in a fix, new version up on crates.io, give it a try, if it looks good now I'll close this. Thanks again.

remifontan commented 4 years ago

apologies, it took me a long time to get back.

works great, thanks :-)