Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
284 stars 24 forks source link

Added tests for avx #49

Closed ronniec95 closed 4 years ago

ronniec95 commented 4 years ago

Hi Lokathor

I've managed to finish off the work to add avx(2) functionality. The following new types have been added f32x8 f64x4 i64x4 u64x4

along with their corresponding tests. There's some annoying changes I had to do to make it work and there might be better optimisations that I could not figure out; though I think I've done the best that's possible with safe_arch.

Please review and let me know if there are issues. All the tests work and I've used it for some pricing simulations without issues so far.

Lokathor commented 4 years ago

So when fusha tried to add acos they had some trouble on the i586 builds with the mask values being stored as floats, and converting some bit patterns to float doesn't preserve the bits exactly because a signaling nan pattern will be turned into a quiet nan pattern.

I won't have much time to check on this today, though I'll have some tomorrow, but that's one guess just looking at the fact that it's an i586 build that's failing.

The fix here is to make the masks their own type of values, and then on i586 the masks can always be stored as unsigned values instead of floats. Obviously this is a breaking change, though it is one i want to get around to doing.

For now... i guess just cfg out any tests that fail and their associated methods unless the appropriate x86/x64 CPU features are enabled.

We'll have to implement https://github.com/Lokathor/wide/issues/43 and then we can probably remove the cfg restrictions

ronniec95 commented 4 years ago

Hey Lokathor

Apologies for the multiple commits. I was trying to replicate your build environment locally to ensure the tests passed in the same way. The sse and above functionality worked yesterday, but getting all the tests to pass in all environments and combinations was trickier.

Due to the from_bits/to_bits issue in #43 the Not() method for f64x4 is wrong which fails the ln() tests for that type in i586. For some curious reason for f32x8::Not trait works perfectly fine in the existing tests. Is this because of luck or ?

Outside that for any sse or above instruction set the tests all pass and the difference to std.rs is appropriately small. This seems like a good sign but let's test this on more examples to be sure. I've tried my default go-to pricing library and so far so good.

Feel free to clean things up as you see fit and I'd be interested in your next full release with this functionality in.

Lokathor commented 4 years ago

As many commits as you need is fine. I occasionally resort to "CI based coding", so I know how it goes, particularly when you're aiming at a target that your own machine isn't.

I can merge now if you like.

ronniec95 commented 4 years ago

Yep, great merge this in please and close the PR