Closed SnarkBoojum closed 2 years ago
Is there something I can do to help solve this? mips64el is one of the supported architectures in Debian, so the fact that it's broken is a problem.
Perhaps you can trace down the exact location of the failure (running valgrind build/arb_hypgeom/test/t-erf
might do it, for example).
I'm trying to to so ; in the meantime, doesn't the fact that it works on mips and not mips64 give a clue?
Also, there's a similar error on riscv64 (a secondary architecture for Debian, but still) ; that page gives an oversight.
I tried valgrind: it only says an exception was triggered and tells me in happened with a call to flint_abort.
Then I tried gdb, and it gave me the same information.
I'll see if I can add debug output to the source and get more juicy bits of knowledge on the issue...
In the first loop, I have iter=7052 when the exception is raised ; repeatably.
If I start the loop at 7052, the problem disappears.
If you just start the loop at one particular iteration, the RNG state will not be forwarded by the same amount, so you will get different input. You could try replacing the first loop with something like this:
for (iter = 0; iter < 10000 * arb_test_multiplier(); iter++)
{
arb_t a, b, c;
slong prec1, prec2;
int alg1, alg2;
prec1 = 2 + n_randint(state, 1000);
prec2 = 2 + n_randint(state, 1000);
arb_init(a);
arb_init(b);
arb_init(c);
arb_randtest_special(a, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));
arb_randtest_special(b, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));
arb_randtest_special(c, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));
alg1 = n_randint(state, 2);
alg2 = n_randint(state, 2);
if (iter == 7052)
{
printf("%d, %d\n", alg1, alg2);
flint_printf("prec1 = %wd, prec2 = %wd\n", prec1, prec2);
flint_printf("a = "); arb_printd(a, 30); flint_printf("\n\n");
switch (alg1)
{
case 0:
if (!arb_hypgeom_erf_bb(b, a, 0, prec1))
arb_hypgeom_erf(b, a, prec1);
break;
default:
arb_hypgeom_erf(b, a, prec1);
break;
}
flint_printf("b = "); arb_printd(b, 30); flint_printf("\n\n");
switch (alg2)
{
case 0:
if (!arb_hypgeom_erf_bb(c, a, 0, prec2))
arb_hypgeom_erf(c, a, prec2);
break;
default:
arb_hypgeom_erf(c, a, prec2);
break;
}
flint_printf("c = "); arb_printd(c, 30); flint_printf("\n\n");
if (!arb_overlaps(b, c))
{
flint_printf("FAIL: overlap\n\n");
flint_printf("a = "); arb_printd(a, 30); flint_printf("\n\n");
flint_printf("b = "); arb_printd(b, 30); flint_printf("\n\n");
flint_printf("c = "); arb_printd(c, 30); flint_printf("\n\n");
flint_abort();
}
}
arb_clear(a);
arb_clear(b);
arb_clear(c);
}
I have
erf....0, 0
prec1 = 453, prec2 = 34
a = -1.59079251697823043944891336571e+90980733389701861217825 +/- +inf
b = nan +/- +inf
c = nan +/- +inf
This shouldn't fail; the input is non-finite and this is handled safely. But the RNG initialization is perhaps different on the failing machine so that you get a different test input here.
Ok, here's what gets printed:
erf....0, 1
prec1 = 978, prec2 = 5
a = -1.86264514923053351477637284983e-9 +/- 0.25003
b = -2.10176998208324612839648922822e-9 +/- 0.28226
Exception (FLINT memory_manager). Unable to allocate memory (145776024000).
Pinging you back after a week - does the output help you, or should I run other tests?
Backtrace:
#0 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x000000fff65a4a50 in __GI_abort () at abort.c:79
#2 0x000000fff6866a30 in flint_abort () at exception.c:46
#3 0x000000fff6865fc8 in flint_memory_error (size=<optimized out>)
at memory_manager.c:53
#4 0x000000fff686604c in flint_malloc (size=145776024000) at memory_manager.c:94
#5 0x000000fff6fb6eec in _arb_vec_init (n=3037000500) at vec_init.c:18
#6 0x000000fff711216c in _arb_hypgeom_gamma_lower_sum_rs_1 (res=0xfffbd08278, p=3,
q=2, z=0xfffbd08248, N=9223372036854775807, prec=28) at gamma_lower_sum_rs.c:73
#7 0x000000fff7108c24 in arb_hypgeom_erf_1f1b (res=0xfffbd084f0, z=0xfffbd08348,
prec=28) at erf.c:107
#8 0x000000fff7109450 in arb_hypgeom_erf_1f1 (res=0xfffbd084f0, z=0xfffbd08490,
prec=5, wp=28) at erf.c:212
#9 0x000000fff710a450 in arb_hypgeom_erf (prec=5, z=0xfffbd08490, res=0xfffbd084f0)
at erf.c:538
#10 arb_hypgeom_erf (res=0xfffbd084f0, z=0xfffbd08490, prec=<optimized out>)
at erf.c:448
#11 0x000000aaaab0bff0 in main () at test/t-erf.c:58
Thanks. The problematic code seems to be the calculation of N in arb_hypgeom_erf_1f1b. For some input the floating-point numbers end up NaN, and this converts differently on different platforms. I will rewrite it in a more robust way.
If you have a patch to test, I can give it a try.
Sorry to be insistent, but the flint-arb package is blocking quite a few other Debian packages because of its failure on mips64el -- I might have to disable this architecture for the package if we can't make it work.
I believe that changing
u = -dz * dz + prec * LOG2 + log(dz);
to
u = -dz * dz + prec * LOG2 + log(dz);
if (dz < 1.0)
u = FLINT_MAX(u, 1e-6);
fixes the issue. I'd like some better code here, but maybe this is good enough...
I'll give it a try ; thanks!
It looks good! Let's see if building 1:2.22.1-2 gives a perfect score!
It's a win! Thanks!
You can see the full log here.