Open fredrik-johansson opened 1 year ago
Among other possible issues, this stood out to me:
189,984 ( 0.13%) val_bits = flint_ctz(x[val_limbs]);
63,328 ( 0.04%) count_trailing_zeros(val_bits, x[val_limbs]);
So the new flint_ctz
looks substantially slower than the old count_trailing_zeros
.
Marking this is a critical performance bug for 3.0.
So the new flint_ctz looks substantially slower than the old count_trailing_zeros.
On closer inspection that seems not to be the case. So the problem might be elsewhere.
I don't really know what is going on here. The slowdown gets very inconsistent if I try to isolate different parts of the Bessel function computation.
FWIW, here is what I get with perf diff
for 10⁵ iterations of the call to acb_hypgeom_bessel_j
in your test. (I also see a ~4% slowdown overall.)
perf_diff.txt
And this is after reverting fd140243cc4b8d29e1e2e6b11c192e54fd2fcae6
to ease function-by-function comparison.
perf_diff_2.txt
So could the slowdown maybe come from 1d06a1fb80??
I'll take a look!
I see more or less the same slowdown on the tag flint-3.0.0-alpha1.
I see a 10% slowdown for, acb_hypgeom_pfq_sum(s, t, a, p, b, q, z, n, prec)
with the arguments
a = NULL
p = 0
b = [1, 2]
q = 2,
z = -0.25
n = 2
prec = 256
This is used internally by the bessel functions. The ratio of the slowdown is the same as I see for bessel, but it only accounts for about 5% of the absolute slowdown.
This is a simple example where I encounter a 10% slowdown. I haven't compiled the code below so it is probably wrong.
acb_struct b[2];
acb_t s, t, z;
acb_init(b + 0);
acb_init(b + 1);
acb_init(s);
acb_init(t);
acb_init(z);
acb_set_si(b + 0, 1);
acb_set_si(b + 1, 2);
acb_mul_2exp_si(z, b + 0, -2);
acb_neg(z, z);
acb_hypgeom_pfq_sum_forward(s, t, NULL, 0, b, 2, 2, 256);
acb_clear(b + 0);
acb_clear(b + 1);
acb_clear(s);
acb_clear(t);
acb_clear(z);
On my system even just
#include "acb.h"
#include "acb_hypgeom.h"
#include "profiler.h"
int main(void) {
acb_t s, b, z;
acb_init(b);
acb_init(s);
acb_init(z);
acb_set_si(b, 1);
acb_mul_2exp_si(z, b, -2);
acb_neg(z, z);
TIMEIT_START
for (int i = 0; i < 1000000; i++)
acb_mul(s, z, b, 256);
TIMEIT_STOP
acb_clear(b);
acb_clear(s);
acb_clear(z);
}
shows a ~5% slowdown, and the same example with acb
replaced by arb
shows a ~3% slowdown.
@Joel-Dahne reported a 10% slowdown computing a simple Bessel function value. I can reproduce a ~4% slowdown on my machine (flint-2.9 + arb-2.23 versus flint 3.0).
Attached are the respective
callgrind
outputs.old.txt new.txt
One can see that certain steps now require a few more instructions to execute. Carefully comparing the two execution profiles might reveal something fixable.