Closed nemequ closed 3 years ago
That is so cool! Thank you very much! :) Of course, if you want to add any other backend, you are welcomed.
You are the second user of this library known to me. ;)
I just pushed a version which has some much more significant changes; the big thing is that I added a notor operation (for ORN
) and created a neon.txt instead of just reusing the x86 versions.
This improved the NEON code quite a bit, as well as x86_32 and x86_64. It does create some noise in the SSE/AVX2/AVX-512 versions, though; AFAICT all the changes are neutral, but you may want to take a closer look.
You are the second user of this library known to me. ;)
I'm not really planning on using it directly; it's for SIMDe, and unfortunately I think it would just be too much code, especially considering we are going to need three versions (for 128/256/512 bit vectors). My current plan is to tweak one of the x86 generators to output C functions which we can then call in loops, so we'll basically end up with something like
// 256 of these, generated by this repo
static inline int32_t simde_x_mm_ternarylogic_impl_0xXX_(int32_t a, int32_t b);
simde__m128i simde_mm_ternarylogic_epi32(simde__m128i a, simde__m128i b, const int imm8) {
simde__m128i r;
switch (imm8) {
case 0xXX:
#pragma omp simd
for (int i = 0 ; i < 4 ; i++) {
r[i] = simde_x_mm_ternarylogic_impl_0xXX_(a[i], b[i]);
}
break;
}
return r;
}
BTW, there is no license anywhere; am I free to use the output without restrictions (or, at least, under an MIT license)?
Will review it soon, it's huge :)
Speaking of the license, I usually set the simplified BSD, but for this project it can be public domain, When we merge your changes, will update the license.
Thanks a lot! This is great. Indeed the x86 version looks better when the new lowering rules were applied.
I'm not sure if you're interested in this or not, but I already have the code so I thought I'd offer.
If you want, I can also provide AltiVec, z/Arch, WebAssembly SIMD, and SVE versions as well. Perhaps more interestingly, I'm also planning a GCC vector extension version. Just let me know which (if any) you want me to file a PR for.