Closed lehuyduc closed 8 months ago
@lehuyduc I assume you checked the output vs correct one? I'm too lazy to redo that every time.
Yes. I tested on 3 different measurements.txt
files, and they're correct. If I find any new error, i'll fix it.
./run_cpp.sh 12 12
could you test the result of this one too? To see how hyper threading is bad for performance when there's many branch miss or L3 cache miss.
./run_cpp.sh 12 12
vs ./run_cpp.sh 12 6
are not so much different, it would be the same second decimal even if the delta > sigma. Didn't look too deeply into that.
Huh, so I guess this is an AMD specific problem. If I run with all virtual threads on 2950X, it's much slower, like 30+% slower. Anyway, thanks for testing!
The blog update is now deploying
https://github.com/lehuyduc/1brc-simd
Hi, I've updated my code to optimize for the 10k keys dataset. On my PC it's ~3x faster (excluding
munmap
time) than the commit you tested. Default dataset performance is a bit slower.Just
./run_cpp.sh
to compile and run.To test the effect of hyper threading, you can do
./run_cpp.sh 12 12
(12
== number of threads total on your CPU). You will see interesting effects on the 10K dataset :DThanks! Looking forwards to your updated result.