asik / FixedMath.Net

Fixed point math C# library
Other
591 stars 126 forks source link

Problem: Sqrt() for Q48.15 format implementation #6

Closed jjcat closed 9 years ago

jjcat commented 9 years ago

Good Repo! I have met a problem to the sqrt for Q48.15 format fixed point. I try to modify your sqrt code, but can not get the correct answer. Is your sqrt only available on Q31.32 format? Code below is my version, I use NUM_BITS /4 - 1 to represent number 15. I would also glad if you can list the detail comments of the fixed point sqrt implementation.

public static Fix64 Sqrt(Fix64 x) {
    var xl = x.m_rawValue;
    if (xl < 0) {
        // We cannot represent infinities like Single and Double, and Sqrt is
        // mathematically undefined for x < 0. So we just throw an exception.
        throw new ArgumentException("Negative value passed to Sqrt", "x");
    }

    var num = (ulong)xl;
    var result = 0UL;

    // second-to-top bit
    var bit = 1UL << (NUM_BITS - 2);

    while (bit > num) {
        bit >>= 2;
    }

    // The main part is executed twice, in order to avoid
    // using 128 bit values in computations.
    for (var i = 0; i < 2; ++i) {
        // First we get the top 48 bits of the answer.
        while (bit != 0) {
            if (num >= result + bit) {
                num -= result + bit;
                result = (result >> 1) + bit;
            }
            else {
                result = result >> 1;
            }
            bit >>= 2;
        }

        if (i == 0) {
            // Then process it again to get the lowest 16 bits.
            if (num > (1UL << (NUM_BITS / 4 - 1)) - 1) {
                // The remainder 'num' is too large to be shifted left
                // by 32, so we have to add 1 to result manually and
                // adjust 'num' accordingly.
                // num = a - (result + 0.5)^2
                //       = num + result^2 - (result + 0.5)^2
                //       = num - result - 0.5
                num -= result;
                num = (num << (NUM_BITS / 4 - 1)) - 0x4000UL;
                result = (result << (NUM_BITS / 4 - 1)) + 0x4000UL;
            }
            else {
                num <<= (NUM_BITS / 4 - 1);
                result <<= (NUM_BITS / 4 - 1);
            }

            bit = 1UL << (NUM_BITS / 4 - 2 - 1);
        }
    }
    // Finally, if next bit would have been 1, round the result upwards.
    if (num > result) {
        ++result;
    }
    return new Fix64((long)result);
}
jjcat commented 9 years ago

I know where my fault, using Q47.16 format can get the correct answer. Fractional length must be power of two.