flintlib / flint

FLINT (Fast Library for Number Theory)
http://www.flintlib.org
GNU Lesser General Public License v3.0
444 stars 245 forks source link

Possible slowdown arb 2.23 -> flint 3.0 #1518

Open fredrik-johansson opened 1 year ago

fredrik-johansson commented 1 year ago

@Joel-Dahne reported a 10% slowdown computing a simple Bessel function value. I can reproduce a ~4% slowdown on my machine (flint-2.9 + arb-2.23 versus flint 3.0).

#include "acb.h"
#include "acb_hypgeom.h"
#include "profiler.h"

int main()

{
    acb_t a, b, res;
    acb_init(a);
    acb_init(b);
    acb_init(res);
    acb_set_d(a, 1.5);
    acb_set_d(b, 2);
    TIMEIT_START
    acb_hypgeom_bessel_j(res, a, b, 256);
    TIMEIT_STOP
}

Attached are the respective callgrind outputs.

old.txt new.txt

One can see that certain steps now require a few more instructions to execute. Carefully comparing the two execution profiles might reveal something fixable.

fredrik-johansson commented 1 year ago

Among other possible issues, this stood out to me:

189,984 ( 0.13%)      val_bits = flint_ctz(x[val_limbs]);
 63,328 ( 0.04%)      count_trailing_zeros(val_bits, x[val_limbs]);

So the new flint_ctz looks substantially slower than the old count_trailing_zeros.

fredrik-johansson commented 1 year ago

Marking this is a critical performance bug for 3.0.

fredrik-johansson commented 1 year ago

So the new flint_ctz looks substantially slower than the old count_trailing_zeros.

On closer inspection that seems not to be the case. So the problem might be elsewhere.

fredrik-johansson commented 1 year ago

I don't really know what is going on here. The slowdown gets very inconsistent if I try to isolate different parts of the Bessel function computation.

mezzarobba commented 1 year ago

FWIW, here is what I get with perf diff for 10⁵ iterations of the call to acb_hypgeom_bessel_j in your test. (I also see a ~4% slowdown overall.) perf_diff.txt

mezzarobba commented 1 year ago

And this is after reverting fd140243cc4b8d29e1e2e6b11c192e54fd2fcae6 to ease function-by-function comparison. perf_diff_2.txt

mezzarobba commented 1 year ago

So could the slowdown maybe come from 1d06a1fb80??

Joel-Dahne commented 1 year ago

I'll take a look!

Joel-Dahne commented 1 year ago

I see more or less the same slowdown on the tag flint-3.0.0-alpha1.

Joel-Dahne commented 1 year ago

I see a 10% slowdown for, acb_hypgeom_pfq_sum(s, t, a, p, b, q, z, n, prec) with the arguments

a = NULL
p = 0
b = [1, 2]
q = 2,
z = -0.25
n = 2
prec = 256

This is used internally by the bessel functions. The ratio of the slowdown is the same as I see for bessel, but it only accounts for about 5% of the absolute slowdown.

Joel-Dahne commented 1 year ago

This is a simple example where I encounter a 10% slowdown. I haven't compiled the code below so it is probably wrong.

acb_struct b[2];
acb_t s, t, z;

acb_init(b + 0);
acb_init(b + 1);
acb_init(s);
acb_init(t);
acb_init(z);

acb_set_si(b + 0, 1);
acb_set_si(b + 1, 2);
acb_mul_2exp_si(z, b + 0, -2);
acb_neg(z, z);

acb_hypgeom_pfq_sum_forward(s, t, NULL, 0, b, 2, 2, 256);

acb_clear(b + 0);
acb_clear(b + 1);
acb_clear(s);
acb_clear(t);
acb_clear(z);
mezzarobba commented 1 year ago

On my system even just

#include "acb.h"
#include "acb_hypgeom.h"
#include "profiler.h"

int main(void) {
    acb_t s, b, z;
    acb_init(b);
    acb_init(s);
    acb_init(z);

    acb_set_si(b, 1);
    acb_mul_2exp_si(z, b, -2);
    acb_neg(z, z);

    TIMEIT_START
    for (int i = 0; i < 1000000; i++)
        acb_mul(s, z, b, 256);
    TIMEIT_STOP

    acb_clear(b);
    acb_clear(s);
    acb_clear(z);
}

shows a ~5% slowdown, and the same example with acb replaced by arb shows a ~3% slowdown.