Update mpool and humansize tests

gperciva commented 9 months ago

The mpool changes are needed for FreeBSD 14.0 with clang; evidently, they improved the malloc speed. (The previous limits were still fine for freebsd 14.0 with gcc.)

The humansize changes are incidental; I happened to look at the logfile, saw:

Parsing "1 BB" as a number of bytes PASSED!

and then wondered what SI prefix B was. That led to commenting the test case definition, then modifying the printf message, and then I figured that I might as well add checks for humansize() itself.

gperciva commented 9 months ago

Hmm. Does the FreeBSD stdlib have any mpool of its own? Look at this funky graph!

freebsd-malloc-time

(I included "clang" in the gnuplot title, but a later test with gcc shows the same thing.)

Huge jump in timing between malloc 200 ints and 201 ints. And similar (although smaller) jumps every subsequent 100 ints.

Same pattern on FreeBSD 13.2 on arm64. I'll test on other versions of FreeBSD, and other OSes, tomorrow.

Here's the code; it's a simplified version of our mpool test. Namely, I removed all mpool stuff, and all libcperciva code.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BUFMAX 3000

int
main(int argc, char * argv[])
{
    struct timespec mono_begin, mono_end;
    long long mono_delta;
    int i, j;
    int num;
    int * x[BUFMAX];

    /* Parse one argument. */
    if (argc < 2) {
        printf("usage: %s num\n", argv[0]);
        exit(1);
    }
    num = atoi(argv[1]);
    if (num >= BUFMAX) {
        printf("num is too large!\n");
        exit(1);
    }

    /* Initial time. */
    if (clock_gettime(CLOCK_MONOTONIC, &mono_begin))
        goto err1;

    for (i = 0; i < 1000; i++) {
        /* Allocate a bunch. */
        for (j = 0; j < num; j++)
            x[j] = malloc(sizeof(int));

        /* Free a bunch. */
        for (j = 0; j < num; j++)
            free(x[j]);
    }

    /* Final time. */
    if (clock_gettime(CLOCK_MONOTONIC, &mono_end))
        goto err1;

    /* Calculate delta and output. */
    mono_delta = (mono_end.tv_nsec - mono_begin.tv_nsec)
        + 1000000000 * (mono_end.tv_sec - mono_begin.tv_sec);
    printf("%f\n", (double)mono_delta / num / 1000);

    return (0);

err1:
    printf("clock error\n");

    /* Failure */
    return (1);
}

gperciva commented 9 months ago

For comparison, here's the plot from our mpool test; mpool has a much more sensible timing curve.

malloc-mpool

cperciva commented 9 months ago

Looks like malloc is allocating a slab of pointers at a time; the jumps are when it needs to reserve a kB for the next bunch of pointers. This is probably just a small-allocation optimization, so if you changed from allocating sizeof(int) to allocating e.g. 128 bytes at a time, I'm guessing it will go away.

gperciva commented 9 months ago

Ok, for general interest, here's some comparisons of different malloc sizes. I split them up:

4 to 32 bytes have the same frequency
looks like the frequency doubles for 64 to 128, and 128 to 256
there's some evidence of frequency in 512 and 1024 (likely doubling as well), but it peters out faster. (I'm sure that an FFT would make the frequencies much easier to spot, but at this point I think we're beyond the realm of "casual interest".)

gperciva commented 9 months ago

More pertinently: what does this mean for our mpool test?

One of the assumptions is that doing 100 allocations (of 4 bytes) would be the ideal case for benefiting from mpool instead of malloc. But on FreeBSD 13 and 14 (and possibly earlier versions), that's still in the range of the slab of pre-allocated pointers.

I could increase the struct stuff to be 256 bytes, but doing allocating that 100 times in a row is still faster (per capita) than doing it 1000 times. Should I bump it to something huge like 16384 bytes? or should we just update the comments to reflect that 100 times might not be the "full benefit", as it depends on the OS.

gperciva commented 9 months ago

Another option would be to change mpool to start off with a size of 1000, then check 1000 for the "full" and 10000 for the "partial".

But my sense is that we tend to use mpool for allocating dozens of items with many bytes, rather than thousands of items with few bytes. So increasing the size of struct stuff is probably better.

gperciva commented 9 months ago

Ok, revised patch: change the allocation from int i to uint8_t buf[128], and don't change the ratio limits.

(64 bytes isn't enough to avoid the ratio becoming too low)

cperciva commented 9 months ago

Looks good. For reference, I think the places where I'm using mpool are generally for allocations in the range of 40-100 bytes, although sometimes it's as low as 16 bytes (two pointers).

gperciva commented 9 months ago

Yeah, in libcperciva we're doing:

4096 records of 16 bytes in events.c
4096 records of 28 bytes in events_immediate.c
16 records of 52 bytes in network_read.c
16 records of 52 bytes in network_read.c

No non-libcperciva uses in tarsnap, scrypt, and spiped. kivaloo does a bunch of 4096 records (and even 32768 records!), but I didn't dig in to see exactly how large those were.

Tarsnap / libcperciva

Update mpool and humansize tests #511