Kivaloo perf - Githubissues

gperciva commented 3 years ago

BTW, once https://github.com/Tarsnap/spiped/pull/308 is merged, I can add the branch that makes kivaloo share object files. :)

cperciva commented 3 years ago

Yeah, I haven't forgotten -- just wanted to get this branch off my laptop first.

gperciva commented 3 years ago

I imagine that you've already benchmarked this, but just for confirmation, there's no obvious bottlenecks at the level of the c source. When I used it for 100 iterations of test_lbs, 9% of the total time was spent on CRC32C_Update_SSE42, 6.85% on elasticarray_get(), and the other functions in the c code are even less of the total.

That said, I did notice that 13.06% is spent on clock_gettime(). That includes 6.94% of the total (relative to the overall time, not relative to the 13% in clock_gettim()) calling elf_aux_info. FreeBSD 12.2, amd64.

I wonder if clock_gettime() is checking for the available clocks every time we call it? I see that FreeBSD has a CLOCK_MONOTONIC_FAST (and Linux has a CLOCK_MONOTONIC_COARSE which looks like the same idea).

I don't have a good handle on the accuracy you want from this, nor how much of a trade-off there is in the freebsd kernel and/or system libraries with CLOCK_FOO_FAST. Also, even a theoretically instanteous gettime would only save 13% of the overall speed, so perhaps it's not worth the complexity.

gperciva commented 3 years ago

For comparison, out of libc, perf is using

time %	function
13.06	`clock_gettime`
4.60	`free`
4.36	`malloc`
2.17	`poll`
1.51	`memcpy`
1.32	`time`
1.04	`recv`
1.03	`realloc
< 1	other functions

All of those make sense to me, other than clock_gettime.

cperciva commented 3 years ago

The call to elf_aux_info is libc looking up the VDSO timehands -- this allows clock_gettime to return without making a syscall (it gets the time from the CPU cycle counter + an offset). Using this I don't think the "fast"/"coarse" timers provide any speed benefit.

In any case, performance seems to be good enough for now -- I was able to push a bit over 10^6 kvlds operations per second through perf. We can always reassess it later if needed but I think it will be a while before we need to deal with anything beyond that point.

cperciva commented 3 years ago

@gperciva Can you glance over this and make sure I didn't do anything stupid while cleaning up the git history?

gperciva commented 3 years ago

Looks good to me.

Tarsnap / kivaloo

Kivaloo perf #240