Figure out why bitarray test takes so long under rr

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.78k stars 5.49k forks source link

Figure out why bitarray test takes so long under rr #45983

Closed Keno closed 2 years ago

Keno commented 2 years ago

The bitarray test appears to take 5 minutes regularly and 20 minutes under rr, contributing significantly to total CI time. This overhead is significantly higher than the usual 1.5-2x rr overhead. It would be nice to understand this.

vtjnash commented 2 years ago

Is throwing expensive for rr recording? There is one particular loop ("copyto!") that checks 80k error cases for copyto and 24k success cases (I didn't realize there could be so many code paths through that function 🙄). This subtest takes around 20s without rr, but 21m24.0s with rr (mostly cpu time spent in rr).

Keno commented 2 years ago

It's possible one of the syscalls in the unwinder isn't being properly syscalls buffered. I wouldn't be particularly surprised, since they're probably not particularly common.

vtjnash commented 2 years ago

To answer my own question, libunwind implements validate_mem by calling mincore (or msync), which is a syscall for every memory access it does. This is controlled by the --disable-conservative-checks flag to configure (default is enabled, despite what the libunwind docs might have you believe).

This would be "fixed" by switching to LLVM, since it never does any memory validation.

Or perhaps we could modify libunwind to use pread("/proc/self/mem", 8, addr) instead to safely access memory, but AFAICT, that might also be blacklisted by is_proc_mem_file?

On Windows, we would just wrap this in try/catch, and it would be fast.

vtjnash commented 2 years ago

How do we feel about disabling this option in libunwind? It causes a very severe performance impact for unwinding: https://github.com/libunwind/libunwind/commit/045c55b2a296988c16a4c1b90f3d8b7e8b78752b

Without that option, this test then completes much faster with rr--much faster even than it could complete without rr with that option enabled.

vtjnash commented 2 years ago

Note that will only affect x86_64, since none of the other platforms support that option (aarch64 never checks validity, arm checks validity--though is also a bad thread-unsafe data race, x86 only checks when required, ppc64 never checks either, ppc sets the flag to check when needed--but never implements code to check it). Side note about the quality of this code base: the data race was fixed on x86_64 by https://github.com/libunwind/libunwind/pull/76, and they broke their ABI version in v1.6 (https://github.com/JuliaPackaging/Yggdrasil/pull/4455) so we have been unable to upgrade.

Maybe LLVM libunwind would be better