flintlib / arb

Arb has been merged into FLINT -- use https://github.com/flintlib/flint/ instead
http://arblib.org/
GNU Lesser General Public License v2.1
457 stars 137 forks source link

[mips64el] Unable to allocate memory during tests #404

Closed SnarkBoojum closed 2 years ago

SnarkBoojum commented 2 years ago

You can see the full log here.

SnarkBoojum commented 2 years ago

Is there something I can do to help solve this? mips64el is one of the supported architectures in Debian, so the fact that it's broken is a problem.

fredrik-johansson commented 2 years ago

Perhaps you can trace down the exact location of the failure (running valgrind build/arb_hypgeom/test/t-erf might do it, for example).

SnarkBoojum commented 2 years ago

I'm trying to to so ; in the meantime, doesn't the fact that it works on mips and not mips64 give a clue?

Also, there's a similar error on riscv64 (a secondary architecture for Debian, but still) ; that page gives an oversight.

SnarkBoojum commented 2 years ago

I tried valgrind: it only says an exception was triggered and tells me in happened with a call to flint_abort.

Then I tried gdb, and it gave me the same information.

I'll see if I can add debug output to the source and get more juicy bits of knowledge on the issue...

SnarkBoojum commented 2 years ago

In the first loop, I have iter=7052 when the exception is raised ; repeatably.

SnarkBoojum commented 2 years ago

If I start the loop at 7052, the problem disappears.

fredrik-johansson commented 2 years ago

If you just start the loop at one particular iteration, the RNG state will not be forwarded by the same amount, so you will get different input. You could try replacing the first loop with something like this:

    for (iter = 0; iter < 10000 * arb_test_multiplier(); iter++)
    {
        arb_t a, b, c;
        slong prec1, prec2;
        int alg1, alg2;

        prec1 = 2 + n_randint(state, 1000);
        prec2 = 2 + n_randint(state, 1000);

        arb_init(a);
        arb_init(b);
        arb_init(c);

        arb_randtest_special(a, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));
        arb_randtest_special(b, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));
        arb_randtest_special(c, state, 1 + n_randint(state, 1000), 1 + n_randint(state, 100));

        alg1 = n_randint(state, 2);
        alg2 = n_randint(state, 2);

if (iter == 7052)
{
        printf("%d, %d\n", alg1, alg2);
        flint_printf("prec1 = %wd, prec2 = %wd\n", prec1, prec2);
        flint_printf("a = "); arb_printd(a, 30); flint_printf("\n\n");

        switch (alg1)
        {
            case 0:
                if (!arb_hypgeom_erf_bb(b, a, 0, prec1))
                    arb_hypgeom_erf(b, a, prec1);
                break;
            default:
                arb_hypgeom_erf(b, a, prec1);
                break;
        }

        flint_printf("b = "); arb_printd(b, 30); flint_printf("\n\n");

        switch (alg2)
        {
            case 0:
                if (!arb_hypgeom_erf_bb(c, a, 0, prec2))
                    arb_hypgeom_erf(c, a, prec2);
                break;
            default:
                arb_hypgeom_erf(c, a, prec2);
                break;
        }

        flint_printf("c = "); arb_printd(c, 30); flint_printf("\n\n");

        if (!arb_overlaps(b, c))
        {
            flint_printf("FAIL: overlap\n\n");
            flint_printf("a = "); arb_printd(a, 30); flint_printf("\n\n");
            flint_printf("b = "); arb_printd(b, 30); flint_printf("\n\n");
            flint_printf("c = "); arb_printd(c, 30); flint_printf("\n\n");
            flint_abort();
        }
}

        arb_clear(a);
        arb_clear(b);
        arb_clear(c);
    }

I have

erf....0, 0
prec1 = 453, prec2 = 34
a = -1.59079251697823043944891336571e+90980733389701861217825 +/- +inf

b = nan +/- +inf

c = nan +/- +inf

This shouldn't fail; the input is non-finite and this is handled safely. But the RNG initialization is perhaps different on the failing machine so that you get a different test input here.

SnarkBoojum commented 2 years ago

Ok, here's what gets printed:

erf....0, 1
prec1 = 978, prec2 = 5
a = -1.86264514923053351477637284983e-9 +/- 0.25003

b = -2.10176998208324612839648922822e-9 +/- 0.28226

Exception (FLINT memory_manager). Unable to allocate memory (145776024000).
SnarkBoojum commented 2 years ago

Pinging you back after a week - does the output help you, or should I run other tests?

AdrianBunk commented 2 years ago

Backtrace:

#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x000000fff65a4a50 in __GI_abort () at abort.c:79
#2  0x000000fff6866a30 in flint_abort () at exception.c:46
#3  0x000000fff6865fc8 in flint_memory_error (size=<optimized out>)
    at memory_manager.c:53
#4  0x000000fff686604c in flint_malloc (size=145776024000) at memory_manager.c:94
#5  0x000000fff6fb6eec in _arb_vec_init (n=3037000500) at vec_init.c:18
#6  0x000000fff711216c in _arb_hypgeom_gamma_lower_sum_rs_1 (res=0xfffbd08278, p=3, 
    q=2, z=0xfffbd08248, N=9223372036854775807, prec=28) at gamma_lower_sum_rs.c:73
#7  0x000000fff7108c24 in arb_hypgeom_erf_1f1b (res=0xfffbd084f0, z=0xfffbd08348, 
    prec=28) at erf.c:107
#8  0x000000fff7109450 in arb_hypgeom_erf_1f1 (res=0xfffbd084f0, z=0xfffbd08490, 
    prec=5, wp=28) at erf.c:212
#9  0x000000fff710a450 in arb_hypgeom_erf (prec=5, z=0xfffbd08490, res=0xfffbd084f0)
    at erf.c:538
#10 arb_hypgeom_erf (res=0xfffbd084f0, z=0xfffbd08490, prec=<optimized out>)
    at erf.c:448
#11 0x000000aaaab0bff0 in main () at test/t-erf.c:58
fredrik-johansson commented 2 years ago

Thanks. The problematic code seems to be the calculation of N in arb_hypgeom_erf_1f1b. For some input the floating-point numbers end up NaN, and this converts differently on different platforms. I will rewrite it in a more robust way.

SnarkBoojum commented 2 years ago

If you have a patch to test, I can give it a try.

SnarkBoojum commented 2 years ago

Sorry to be insistent, but the flint-arb package is blocking quite a few other Debian packages because of its failure on mips64el -- I might have to disable this architecture for the package if we can't make it work.

fredrik-johansson commented 2 years ago

I believe that changing

    u = -dz * dz + prec * LOG2 + log(dz);

to

    u = -dz * dz + prec * LOG2 + log(dz);
    if (dz < 1.0)
        u = FLINT_MAX(u, 1e-6);

fixes the issue. I'd like some better code here, but maybe this is good enough...

SnarkBoojum commented 2 years ago

I'll give it a try ; thanks!

SnarkBoojum commented 2 years ago

It looks good! Let's see if building 1:2.22.1-2 gives a perfect score!

SnarkBoojum commented 2 years ago

It's a win! Thanks!