MikeLankamp / fpm

C++ header-only fixed-point math library
https://mikelankamp.github.io/fpm
MIT License
672 stars 85 forks source link

switch to disable intrinsic functions #28

Open Klummel69 opened 2 years ago

Klummel69 commented 2 years ago

When calling the function sqrt() I stumbled across a message from my compiler: Error: Unsupported intrinsic: llvm.ctlz.i64

Reason: the function find_highest_bit() uses intrinsic functions (for performance reasons). I use a modified gcc here which does not support intrinsic calls.

I have adapted my code as follows, possibly this would be also an option for the main branch:

By using a preprocessor switch FPM_NO_INTRINSIC you can prevent the use of intrinsic calls and at the same time the code works on many compilers which are not supported so far.

inline long find_highest_bit(unsigned long long value) noexcept
{
    assert(value != 0);
#if defined(FPM_NO_INTRINSIC) // Note: Non-optimised Version
    int count = 0;
    while (value >>= 1) {count ++;}  
    return count;
#elif defined(_MSC_VER)
    unsigned long index;
#if defined(_WIN64)
    _BitScanReverse64(&index, value);
#else
    if (_BitScanReverse(&index, static_cast<unsigned long>(value >> 32)) != 0) {
        index += 32;
    } else {
        _BitScanReverse(&index, static_cast<unsigned long>(value & 0xfffffffflu));
    }
#endif
    return index;
#elif defined(__GNUC__) || defined(__clang__)
    return sizeof(value) * 8 - 1 - __builtin_clzll(value);
#else
#   error "your platform does not support find_highest_bit()"
#endif
}

(Admittedly, the code is slow, if necessary I can include an optimized version with bitmasks).

One question: Is there a reason why find_highest_bit() returns a datatype long instead of int?

MikeLankamp commented 2 years ago

Hi @Klummel69. I can see the value in supporting those platforms. I'd indeed prefer one of the "bit hack" solutions over the naive loop, but I'd rather avoid adding preprocessor macros, so a) I'd want to benchmark the difference (it's possible the bit hacks are faster), and b) I'd like the code to be generalized to the templated type. I'll have to think about it, actually understand the bit hacks and experiment a bit.