basepi / libgit2

The Library
http://libgit2.github.com
Other
0 stars 0 forks source link

Explore bogosqrt vs q3sqrt #21

Open hausdorf opened 13 years ago

hausdorf commented 13 years ago

We use square roots for a number of things. Our current implementation is here -- just the standard approximation method.

One of the team suggested that we look into using Quake 3's sqrt hack. Initial drawbacks seem to be that it's for floats, not longs. Perhaps we can adapt it to fit longs.

trane commented 13 years ago

I did some tests here, bogosqrt does not give a real value of sqrt (in fact it is ~ 2x as large). Here is the output from my test, math is the built in sqrt from math.h: bogo: 4096 q3: 2330 math: 2330

#include <stdlib.h>
#include <math.h>
#include <stdio.h>
long bogosqrt(long n) {
    long i;

    /*  
     * Classical integer square root approximation using shifts.
     */
    for (i = 1; n > 0; n >>= 2)
        i <<= 1;

    return i;
}

long quake(long number) {
    long i;
    float x, y;
    const float f = 1.5F;

    x = (float)number * 0.5F;
    y  = (float)number;
    i  = * ( long * ) &y; 
    i  = 0x5f3759df - ( i >> 1 );
    y  = * ( float * ) &i; 
    y  = y * ( f - ( x * y * y ) );
    y  = y * ( f - ( x * y * y ) );

    return (number * y); 
}

int main(void)
{
    long num = 5432123;
    long bogo = bogosqrt(num);
    long q3 = quake(num);
    long math = (long)sqrt(5432123.0f);

    printf("bogo: %ld\n", bogo);
    printf("q3: %ld\n", q3);
    printf("math: %ld\n", math);
}
hausdorf commented 13 years ago

Maybe it's just incredibly bad.

Do we know which is faster?

trane commented 13 years ago

math.h sqrt is significantly faster computing 5432123

bogo ~ 18s quake ~ 20s sqrt ~ 1.4s

this was the code for profiling (obviously an approximation)

    long i = 500000000;
    while (--i) {
        quake(num);
    }

Interesting thing, quake AND sqrt take the same time to compute the sqrt of 200 as 5432123, while bogo takes around 7s.

hausdorf commented 13 years ago

WTF. I'll look into it in a minute.

hausdorf commented 13 years ago

Could this be because q3 uses floats while the others to not?

trane commented 13 years ago

That's between you and your priest.

basepi commented 13 years ago

Could be the floats, but I'm betting it's some extreme optimization in math's sqrt() function.

But what do I know?

hausdorf commented 13 years ago

math.sqrt() uses raw assembly and, if arch supports it, a special instruction that does float square roots. That is with the float though. And computation is a lot more expensive in floats.

I actually do think this is probably one of the reasons for this result.

basepi commented 13 years ago

I actually do think this is probably one of the reasons for this result.

You think the extreme optimization is the reason? I guess my question is, does math.sqrt() use floats when it's given a long as the parameter? Because if so, then it can't be the floats that are causing the timing discrepancy, and it must be the optimization. In which case, can we use math.sqrt(), or is that considered an external dependency?

hausdorf commented 13 years ago

No, q3() and the other two use longs or ints. Floats are computationally expensive. I do think that this contributes to the results above.

basepi commented 13 years ago

But if you're saying that math.sqrt() uses floats and floats are more computationally expensive, then that doesn't make sense, as math.sqrt() had the best time.

hausdorf commented 13 years ago

Looks to me like math.sqrt() should be square rooting longs as shown in code above.

EDIT: I see the problem. I should have said here that math.sqrt() does NOT in this case use floats.

basepi commented 13 years ago

Cool.

So why were we having this discussion? math.sqrt() seems head and shoulders above the others in performance, seems like it's a no-brainer to pick which one.

trane commented 13 years ago

Not necessarily. xdiff uses bogosqrt and we need to find out why, especially since it is wrong. Though, I think I remember that it does a << 1 after calling bogosqrt, but I can't remember. If it does, in fact, do that then it gets a good approximation. We also need to verify that my code is sound, I wrote it during 2100 where I was half-focused. Perhaps I did something crazy.

trane commented 13 years ago

@DrSleep have you had any success with Math.sqrt() or have you just been using bogosqrt() for your tests?

hausdorf commented 13 years ago

I've not tried it. Eventually, yes, but when I have more time.