WojciechMula / ternary-logic

Support for ternary logic in SSE, XOP, AVX2 and x86 programs
Other
30 stars 9 forks source link

NEON implementation #5

Closed nemequ closed 3 years ago

nemequ commented 3 years ago

I'm not sure if you're interested in this or not, but I already have the code so I thought I'd offer.

If you want, I can also provide AltiVec, z/Arch, WebAssembly SIMD, and SVE versions as well. Perhaps more interestingly, I'm also planning a GCC vector extension version. Just let me know which (if any) you want me to file a PR for.

WojciechMula commented 3 years ago

That is so cool! Thank you very much! :) Of course, if you want to add any other backend, you are welcomed.

You are the second user of this library known to me. ;)

nemequ commented 3 years ago

I just pushed a version which has some much more significant changes; the big thing is that I added a notor operation (for ORN) and created a neon.txt instead of just reusing the x86 versions.

This improved the NEON code quite a bit, as well as x86_32 and x86_64. It does create some noise in the SSE/AVX2/AVX-512 versions, though; AFAICT all the changes are neutral, but you may want to take a closer look.

You are the second user of this library known to me. ;)

I'm not really planning on using it directly; it's for SIMDe, and unfortunately I think it would just be too much code, especially considering we are going to need three versions (for 128/256/512 bit vectors). My current plan is to tweak one of the x86 generators to output C functions which we can then call in loops, so we'll basically end up with something like

// 256 of these, generated by this repo
static inline int32_t simde_x_mm_ternarylogic_impl_0xXX_(int32_t a, int32_t b);

simde__m128i simde_mm_ternarylogic_epi32(simde__m128i a, simde__m128i b, const int imm8) {
  simde__m128i r;

  switch (imm8) {
    case 0xXX:
      #pragma omp simd
      for (int i = 0 ; i < 4 ; i++) {
        r[i] = simde_x_mm_ternarylogic_impl_0xXX_(a[i], b[i]);
      }
      break;
  }

  return r;
}

BTW, there is no license anywhere; am I free to use the output without restrictions (or, at least, under an MIT license)?

WojciechMula commented 3 years ago

Will review it soon, it's huge :)

Speaking of the license, I usually set the simplified BSD, but for this project it can be public domain, When we merge your changes, will update the license.

WojciechMula commented 3 years ago

Thanks a lot! This is great. Indeed the x86 version looks better when the new lowering rules were applied.