IvantheDugtrio / veclib

Vector library for porting SSE2 instructions to other architectures
MIT License
14 stars 3 forks source link

Consider merging with SIMDe project #3

Open nemequ opened 7 years ago

nemequ commented 7 years ago

I've been working on a similar project called SIMDe which is also MIT licensed, and is also an attempt to allow code written for one set of SIMD instructions to run on machines without them.

We're both working on implementing x86/x86_64 ISA extensions right now, but SIMDe is using portable fallbacks (with hints to encourage the compiler to vectorize what it can) instead of POWER instrisics. I have been planning to create an AltiVec/VMX/VSX backend for SIMDe eventually, but so far I've been focusing on getting the portable version in place. Eventually I also intend to go in the other direction with SIMDe: AltiVec/VMX/VSX (and others) to SSE (and everything else).

I'm wondering if you would be interested in merging the two projects. I think it would be great for both projects; it would increase the number of functions supported (SIMDe already fully supports all of MMX and SSE1, as well as partial support for several others), and of course from SIMDe's perspective it would greatly improve performance on POWER machines. I think veclib would also benefit from SIMDe's infrastructure; we have a pretty decent test suite, and continuous integration.

The big caveat, as far as I'm concerned, is that I'm not comfortable using powerveclib due to the license. I intend to reach out to the author about this issue, but given that it's an IBM project I don't hold out a lot of hope for getting a more flexibly-licensed version. I'm not exactly sure where the line between AltiVec/VMX/VSX and powerveclib is (I've never used the POWER intrinsics before), but I'm guessing this may reduce the number of instructions which are accelerated in the short term.

IvantheDugtrio commented 7 years ago

I would definitely be interested in merging the projects. The POWER instrinsics are somewhat of a nearly 1:1 SSE2 port of AltiVec/VMX/VSX instructions but using native AltiVec/VMX/VSX instructions is better. I think if we make our own functions that behave like SSE2 on POWER using powerveclib as a guideline that would be ideal.

I've also wanted to add ARM NEON support so this could be expanded to include other instruction sets and architectures.

nemequ commented 7 years ago

Glad to hear it.

It seems like the first thing to do is copy the existing implementations from veclib over to SIMDe. I think this part should be pretty straightforward. SIMDe already has full implementations for MMX, SSE, SSE2. and SSE3 (plus a few other functions) so most of the necessary tests should already be in place. I've been putting this off because I'm having trouble getting access to a POWER machine… I'm supposed to be able to create VMs on OSU's OSL for Squash (which will use SIMDe indirectly) but I've been getting error messages lately and haven't pushed the issue. I'll try to look into it soon.

Once that's done, it's just a matter of adding more AltiVec/VMX/VSX implementations. I'm a bit uncomfortable with using powerveclib as a reference due to the license, but IBM has a list which looks helpful. Between that and just looking at the headers distributed with GCC and clang hopefully there will be enough information.

Implementing POWER functions on other architectures seems a bit more troublesome since, AFAICT, they require a non-standard "vector" keyword (i.e., compiler support). I'm not entirely sure how to resolve this, but I'm guessing it would require numerous, though relatively small, API breaks.

I'm not sure if you're interested in helping with any of the coding or not. Either way I intend to start working on this, but obviously any help would speed things up. If you're interested I'd be happy to give you access to SIMDe.

IvantheDugtrio commented 7 years ago

Okay what I will do is create a branch using functions following SIMDe's guidelines for function names using AltiVec/VMX/VSX implementations. Once I get that implementation working I will merge that with the master and submit a pull request to SIMDe. Hopefully I can add more functions than what I've done so far.

My work has an IBM POWER 740 that I've been using for development. This has limited my ability to experiment with POWER8 functions.

nemequ commented 7 years ago

Okay what I will do is create a branch using functions following SIMDe's guidelines for function names using AltiVec/VMX/VSX implementations.

You shouldn't need to create any functions, just a few entries (hidden by the preprocessor when __ALTIVEC__ isn't defined) to add AltiVec alternatives in the __m64/__m128/__m128i/__m128d unions, then elifs in the existing implementations right before the portable version (alongside the existing SSE and NEON cases).

Hopefully I can add more functions than what I've done so far.

The tests should really help there. They certainly made working on NEON a lot easier.

My work has an IBM POWER 740 that I've been using for development. This has limited my ability to experiment with POWER8 functions.

OSL has POWER8 machines; I'll try to get that working again soon.