Closed dzaima closed 7 months ago
You are just in time. I'm creating some automation here: https://github.com/buybackoff/1brc-bench
And will run lots of CPU core/threads combinations over the weekend on two big Intel/AMD machines.
There are already special cases for @lehuyduc and @austindonisan. It would be great if you could adjust run.sh as needed.
here's adding my repo to that; feel free to merge or pick in changes yourself, or request PR
Great, thank you!
Ok, my solution should now have an aarch64 NEON version, took slightly more time than I expected (though a decent amount was just me being stupid). Should build the same way, but have done only minimal testing and precisely zero tuning (and replaced some manual string comparison with a memcmp
call, and left semicolon search at a much-less-efficient-than-desired state I went ahead and wrote a saner semicolon search) as my only available aarch64 device is my phone with an A53..
https://github.com/dzaima/1brc - build with
make a.out
, run withTHREADS_1BRC=[desired thread count] ./a.out path/to/measurements.txt
.Language column is mildly non-trivial - there's C, C++, and Singeli involved (Singeli generating
gen.c
which is committed to the repo to not require Singeli to build).I expect it to roughly match lehuyduc's entry on both the original & 10K datasets (on my PC my solution is ~10% faster for 10K, and roughly the same for original).
(the repo also has a Java solution, but it's only optimized for the original dataset, and has extremely high variance (>2x; JIT compilation screwing up? haven't bothered looking into it), and with JIT startup time and worse codegen it struggles even at the best of times, so not worth including)