Closed gperciva closed 9 months ago
Hmm. Does the FreeBSD stdlib have any mpool of its own? Look at this funky graph!
(I included "clang" in the gnuplot title, but a later test with gcc shows the same thing.)
Huge jump in timing between malloc
200 ints and 201 ints. And similar (although smaller) jumps every subsequent 100 ints.
Same pattern on FreeBSD 13.2 on arm64. I'll test on other versions of FreeBSD, and other OSes, tomorrow.
Here's the code; it's a simplified version of our mpool test. Namely, I removed all mpool stuff, and all libcperciva code.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define BUFMAX 3000
int
main(int argc, char * argv[])
{
struct timespec mono_begin, mono_end;
long long mono_delta;
int i, j;
int num;
int * x[BUFMAX];
/* Parse one argument. */
if (argc < 2) {
printf("usage: %s num\n", argv[0]);
exit(1);
}
num = atoi(argv[1]);
if (num >= BUFMAX) {
printf("num is too large!\n");
exit(1);
}
/* Initial time. */
if (clock_gettime(CLOCK_MONOTONIC, &mono_begin))
goto err1;
for (i = 0; i < 1000; i++) {
/* Allocate a bunch. */
for (j = 0; j < num; j++)
x[j] = malloc(sizeof(int));
/* Free a bunch. */
for (j = 0; j < num; j++)
free(x[j]);
}
/* Final time. */
if (clock_gettime(CLOCK_MONOTONIC, &mono_end))
goto err1;
/* Calculate delta and output. */
mono_delta = (mono_end.tv_nsec - mono_begin.tv_nsec)
+ 1000000000 * (mono_end.tv_sec - mono_begin.tv_sec);
printf("%f\n", (double)mono_delta / num / 1000);
return (0);
err1:
printf("clock error\n");
/* Failure */
return (1);
}
For comparison, here's the plot from our mpool test; mpool has a much more sensible timing curve.
Looks like malloc is allocating a slab of pointers at a time; the jumps are when it needs to reserve a kB for the next bunch of pointers. This is probably just a small-allocation optimization, so if you changed from allocating sizeof(int)
to allocating e.g. 128 bytes at a time, I'm guessing it will go away.
Ok, for general interest, here's some comparisons of different malloc sizes. I split them up:
More pertinently: what does this mean for our mpool test?
One of the assumptions is that doing 100 allocations (of 4 bytes) would be the ideal case for benefiting from mpool instead of malloc. But on FreeBSD 13 and 14 (and possibly earlier versions), that's still in the range of the slab of pre-allocated pointers.
I could increase the struct stuff
to be 256 bytes, but doing allocating that 100 times in a row is still faster (per capita) than doing it 1000 times. Should I bump it to something huge like 16384 bytes? or should we just update the comments to reflect that 100 times might not be the "full benefit", as it depends on the OS.
Another option would be to change mpool to start off with a size of 1000, then check 1000 for the "full" and 10000 for the "partial".
But my sense is that we tend to use mpool for allocating dozens of items with many bytes, rather than thousands of items with few bytes. So increasing the size of struct stuff
is probably better.
Ok, revised patch: change the allocation from int i
to uint8_t buf[128]
, and don't change the ratio limits.
(64 bytes isn't enough to avoid the ratio becoming too low)
Looks good. For reference, I think the places where I'm using mpool are generally for allocations in the range of 40-100 bytes, although sometimes it's as low as 16 bytes (two pointers).
Yeah, in libcperciva we're doing:
events.c
events_immediate.c
network_read.c
network_read.c
No non-libcperciva uses in tarsnap, scrypt, and spiped. kivaloo does a bunch of 4096 records (and even 32768 records!), but I didn't dig in to see exactly how large those were.
The mpool changes are needed for FreeBSD 14.0 with clang; evidently, they improved the malloc speed. (The previous limits were still fine for freebsd 14.0 with gcc.)
The humansize changes are incidental; I happened to look at the logfile, saw:
and then wondered what SI prefix
B
was. That led to commenting the test case definition, then modifying theprintf
message, and then I figured that I might as well add checks forhumansize()
itself.