Optimize reflect - Githubissues

cfriedt / crcx

LibCRCx

MIT License

1 stars 0 forks source link

There is a pretty simple way to optimize reflect() in the case where N (the number of bits) is a multiple of 4 (or 8). We can use a LUT of length 16 (or 256) where each entry in the LUT is simply the bit-reversed entry of its index. This would mean that all reflect() calls would be O( log( N ) ) operations rather than O( N ). Possibly for embedded systems we may want to use a LUT of length 16, but on most systems 256 bytes is nothing significant.

At runtime, reflect could be optimized by using machine-specific instructions. E.g. on ARM this would correspond to RBIT and on x86_64 this would correspond to ??. There is a longstanding bug in GCC to provide a compiler builtin specifically for this purpose.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481

It would be nice to quantize the speedup as well. That will likely involve pre-generating a significant amount of random data of varying word-lengths and then calling the function repeatedly, then dividing the total execution time by the number of calls made to reflect().

Doing this with C++ templates is pretty straight forward. I would like to look into C11 generics (issue #13) before this is done though, since that could simplify the C API.

constexpr uint8_t bitrev4(const uint8_t& nibble) { const uint8_t lut[] = { /* [0b0000] = */ 0b0000, /* [0b0001] = */ 0b1000, /* [0b0010] = */ 0b0100, /* [0b0011] = */ 0b1100, /* [0b0100] = */ 0b0010, /* [0b0101] = */ 0b1010, /* [0b0110] = */ 0b0110, /* [0b0111] = */ 0b1110, /* [0b1000] = */ 0b0001, /* [0b1001] = */ 0b1001, /* [0b1010] = */ 0b0101, /* [0b1011] = */ 0b1101, /* [0b1100] = */ 0b0011, /* [0b1101] = */ 0b1011, /* [0b1110] = */ 0b0111, /* [0b1111] = */ 0b1111, }; return lut[nibble & 0xf]; } constexpr uint8_t bitrev8(const uint8_t& byte) { return 0 | (bitrev4(byte >> 0) << 4) | (bitrev4(byte >> 4) << 0) ; } template <typename T> constexpr T bitrev(const T& x) { constexpr size_t N = sizeof(x); T y(0); for(uint8_t k = 0; k < N; ++k) { uint8_t a = x >> (k * 8); uint8_t b = x >> (N - k - 1); b = bitrev8(b); a = bitrev8(a); y |= T(b) << (k * 8); y |= T(a) << (N - k - 1); } return y; }

cfriedt / crcx

Optimize reflect #31