Closed Keno closed 2 years ago
Is throwing expensive for rr recording? There is one particular loop ("copyto!") that checks 80k error cases for copyto and 24k success cases (I didn't realize there could be so many code paths through that function 🙄). This subtest takes around 20s without rr, but 21m24.0s with rr (mostly cpu time spent in rr).
It's possible one of the syscalls in the unwinder isn't being properly syscalls buffered. I wouldn't be particularly surprised, since they're probably not particularly common.
To answer my own question, libunwind implements validate_mem
by calling mincore
(or msync
), which is a syscall for every memory access it does. This is controlled by the --disable-conservative-checks
flag to configure (default is enabled, despite what the libunwind docs might have you believe).
This would be "fixed" by switching to LLVM, since it never does any memory validation.
Or perhaps we could modify libunwind to use pread("/proc/self/mem", 8, addr)
instead to safely access memory, but AFAICT, that might also be blacklisted by is_proc_mem_file
?
On Windows, we would just wrap this in try/catch, and it would be fast.
How do we feel about disabling this option in libunwind? It causes a very severe performance impact for unwinding: https://github.com/libunwind/libunwind/commit/045c55b2a296988c16a4c1b90f3d8b7e8b78752b
Without that option, this test then completes much faster with rr--much faster even than it could complete without rr with that option enabled.
Note that will only affect x86_64, since none of the other platforms support that option (aarch64 never checks validity, arm checks validity--though is also a bad thread-unsafe data race, x86 only checks when required, ppc64 never checks either, ppc sets the flag to check when needed--but never implements code to check it). Side note about the quality of this code base: the data race was fixed on x86_64 by https://github.com/libunwind/libunwind/pull/76, and they broke their ABI version in v1.6 (https://github.com/JuliaPackaging/Yggdrasil/pull/4455) so we have been unable to upgrade.
Maybe LLVM libunwind would be better
The bitarray test appears to take 5 minutes regularly and 20 minutes under rr, contributing significantly to total CI time. This overhead is significantly higher than the usual 1.5-2x rr overhead. It would be nice to understand this.