Closed cperciva closed 3 years ago
Yeah, I haven't forgotten -- just wanted to get this branch off my laptop first.
I imagine that you've already benchmarked this, but just for confirmation, there's no obvious bottlenecks at the level of the c source. When I used it for 100 iterations of test_lbs
, 9% of the total time was spent on CRC32C_Update_SSE42
, 6.85% on elasticarray_get()
, and the other functions in the c code are even less of the total.
That said, I did notice that 13.06% is spent on clock_gettime()
. That includes 6.94% of the total (relative to the overall time, not relative to the 13% in clock_gettim()
) calling elf_aux_info
. FreeBSD 12.2, amd64.
I wonder if clock_gettime()
is checking for the available clocks every time we call it? I see that FreeBSD has a CLOCK_MONOTONIC_FAST (and Linux has a CLOCK_MONOTONIC_COARSE which looks like the same idea).
I don't have a good handle on the accuracy you want from this, nor how much of a trade-off there is in the freebsd kernel and/or system libraries with CLOCK_FOO_FAST. Also, even a theoretically instanteous gettime would only save 13% of the overall speed, so perhaps it's not worth the complexity.
For comparison, out of libc, perf
is using
time % | function |
---|---|
13.06 | clock_gettime |
4.60 | free |
4.36 | malloc |
2.17 | poll |
1.51 | memcpy |
1.32 | time |
1.04 | recv |
1.03 | `realloc |
< 1 | other functions |
All of those make sense to me, other than clock_gettime
.
The call to elf_aux_info
is libc looking up the VDSO timehands -- this allows clock_gettime
to return without making a syscall (it gets the time from the CPU cycle counter + an offset). Using this I don't think the "fast"/"coarse" timers provide any speed benefit.
In any case, performance seems to be good enough for now -- I was able to push a bit over 10^6 kvlds operations per second through perf. We can always reassess it later if needed but I think it will be a while before we need to deal with anything beyond that point.
@gperciva Can you glance over this and make sure I didn't do anything stupid while cleaning up the git history?
Looks good to me.
BTW, once https://github.com/Tarsnap/spiped/pull/308 is merged, I can add the branch that makes kivaloo share object files. :)