Locietta / xanmod-kernel-WSL2

Xanmod kernel for WSL2, built by clang with ThinLTO enabled. Build & Release are automated by Github Action.
GNU General Public License v2.0
88 stars 20 forks source link

[Performance/need info] ~7% performance loss in syscall overhead & process creation with UnixBench #8

Closed Locietta closed 2 years ago

Locietta commented 2 years ago

Run unixbench for some casual benchmark between the default kernel (5.10.102.1-microsoft-standard-WSL2) and custom kernel (5.17.1-xanmod1-locietta-WSL2).

It turns out that while there's ~10% single core and ~40% multi-core in filesystem performance boost, it drops about 7% performance in terms of syscall overhead and process creation.

Maybe should look into XanMod kernel config to see what's missed. Probably some configs about scheduler is missed in current defconfig.

And also try some other benchmarks, since unixbench is quite old and might be unsuitable for VMs and CPUs these days...

Raw benchmark result: xanmod_kernel_bench.txt default_kernel_bench.txt

Locietta commented 2 years ago

So... UnixBench is using get_pid() to measure syscall overhead, but this "syscall" is actually cached by runtime, it doesn't actually indicate the syscall overhead. This issue is fixed in latest commit last month, but it isn't ported to AUR yet.

No, since glibc-2.25, they remove the cache for get_pid(), so this won't affect the bench result.

Locietta commented 2 years ago

Run sudo perf bench syscall all 10 times and calculate average

Result: time for 10 million getppid() calls
5.10.102.1-microsoft-standard-WSL2 0.3084 us
5.17.1-xanmod1-locietta-WSL2 0.3365 us
Performance loss in syscall overhead ~9%

Just like the result given by UnixBench, so the benchmark result should be reliable 😥

Locietta commented 2 years ago

Just like the result given by UnixBench, so the benchmark result should be reliable 😥

Though suspect if it's essential for any workload, but the performance drop is just weird.