LouisBrunner / valgrind-macos

A valgrind mirror with latest macOS support
GNU General Public License v2.0
1.16k stars 59 forks source link

M1/M2 Compatibility #56

Open gilaroni opened 2 years ago

gilaroni commented 2 years ago

will there be a version for m1?

paulfloyd commented 6 months ago

Not really, mostly guessing/trial and error. But the DCZID_EL0 register specifies the proper value, so could use that to double-check. The current code sets this to an invalid value, hoping that generic code would not then use this instruction, but macOS OS-level code is not generic... Purely by arch-spec we'd just need to make sure that the DCZID_EL0 return value and the size used by DC_ZVA match, but as far as I can tell that will not work on macOS, unless it all matches what the actual M processors do. DISCLAIMER: it's all guesswork from my side.

I am not very knowledgeable about this so thanks for this background as it will make easier to research and fix.

I just pushed a bunch of changes upstream for arm64 concerning several mrs and dc opcodes. That includes mrs dczd_el0 and dc zva.

LouisBrunner commented 3 weeks ago

I was planning to merge the arm64 changes into main last weekend but unfortunately I encountered a performance issue which needs to be addressed first.

Good news

Valgrind is basically functional on Apple Silicon on feature/m1. This branch incorporates a lot of changes:

These changes mean that the regression tests have improved significantly, macOS 13 amd64 has a 30% reduction in failure and macOS 14 arm64 has less failures than the current main branch (amd64!) of this repository. arm64 support has also been tested extensively on macOS 14 and 15.

So you can run Valgrind, it will probably work well and all is great.

Bad news

There is some kind of memory issue which happens when Valgrind is running (ironic, isn't it?). This is very obvious when running a GUI application or the regression tests. Your machine will slow down to a crawl and simply stopping Valgrind will not be enough to restore your system into a usable state, you will need to restart. Such freezes, where a force reset was required, happened to me a handful of time when working on Valgrind.

I have been sampling my system with vm_stat and I am seeing some very weird things. However I am yet to find a reason behind this.

Due to the dramatic effect of this issue (requiring a reboot or basically crashing your computer altogether), I can't release the current state of arm64 on main.

Going forward

I have been crunching pretty hard trying to get Valgrind ready for release and need to take break from it. I would have wanted to report the release of the first macOS arm64 version but it wasn't meant to be. However, we have never been so close to a stable release and I am confident that we are at the end of this long journey.

paulfloyd commented 3 weeks ago

Bravo! When you are ready we can work again on getting this merged upstream.

pilotniq commented 5 days ago

Thank you for the amazing work. I hope you are rejuvenated by your break.

I tried to run the version in the feature/m1 branch on a program of mine and got the following error. I'm on MacOS 14.7 (23H124) on an Apple M2 Max. I don't know if the following output is of interest:

` % ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump ==92036== Memcheck, a memory error detector ==92036== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==92036== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info ==92036== Command: ./ref_speaker_curl_pump ==92036== --92036-- VALGRIND INTERNAL ERROR: Valgrind received a signal 10 (SIGBUS) - exiting --92036-- si_code=1; Faulting address: 0x7000017F2BD0; sp: 0x7000017ea720

valgrind: the 'impossible' happened: Killed by fatal signal

host stacktrace: ==92036== at 0x15A9F2974: vgModuleLocal_check_macho_and_get_rw_loads (readmacho.c:134) ==92036== by 0x15A9D28EB: vgPlain_di_notify_mmap (debuginfo.c:1395) ==92036== by 0x15AA42423: vgSysWrap_darwin_mmap_after (syswrap-darwin.c:4711) ==92036== by 0x15AA22C63: vgPlain_post_syscall (syswrap-main.c:2713) ==92036== by 0x15AA225B3: vgPlain_client_syscall (syswrap-main.c:2634) ==92036== by 0x15AA20643: handle_syscall (scheduler.c:1208) ==92036== by 0x15AA1E04B: vgPlain_scheduler (scheduler.c:1582) ==92036== by 0x15AA31AF3: run_a_thread_NORETURN (syswrap-darwin.c:126)

sched status: running_tid=1

Thread 1: status = VgTs_Runnable syscall unix:197 (lwpid 259) ==92036== at 0x104039354: __mmap (in /usr/lib/dyld) ==92036== by 0x304FF563F: ??? (in /dev/ttys021) ==92036== by 0x10405D177: dyld4::SyscallDelegate::withReadOnlyMappedFile(Diagnostics&, char const, bool, void ( block_pointer)(void const, unsigned long, bool, dyld4::FileID const&, char const)) const (in /usr/lib/dyld) ==92036== by 0x104058AD7: dyld4::JustInTimeLoader::makeJustInTimeLoaderDisk(Diagnostics&, dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&, bool, unsigned int, mach_o::Layout const) (in /usr/lib/dyld) ==92036== by 0x10404D877: dyld4::Loader::makeDiskLoader(Diagnostics&, dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&, bool, unsigned int, mach_o::Layout const) (in /usr/lib/dyld) ==92036== by 0x10404EFC3: invocation function for block in dyld4::Loader::getLoader(Diagnostics&, dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld) ==92036== by 0x10404DF23: dyld4::Loader::forEachResolvedAtPathVar(dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&, dyld4::ProcessConfig::PathOverrides::Type, bool&, void ( block_pointer)(char const, dyld4::ProcessConfig::PathOverrides::Type, bool&)) (in /usr/lib/dyld) ==92036== by 0x10403CFAB: dyld4::ProcessConfig::PathOverrides::forEachPathVariant(char const, dyld3::Platform, bool, bool, bool&, void ( block_pointer)(char const, dyld4::ProcessConfig::PathOverrides::Type, bool&)) const (in /usr/lib/dyld) ==92036== by 0x10404DA5B: dyld4::Loader::forEachPath(Diagnostics&, dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&, void ( block_pointer)(char const, dyld4::ProcessConfig::PathOverrides::Type, bool&)) (in /usr/lib/dyld) ==92036== by 0x10404E14F: dyld4::Loader::getLoader(Diagnostics&, dyld4::RuntimeState&, char const, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld) ==92036== by 0x104056B8F: invocation function for block in dyld4::JustInTimeLoader::loadDependents(Diagnostics&, dyld4::RuntimeState&, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld) ==92036== by 0x104076C9F: invocation function for block in mach_o::Header::forEachDependentDylib(void ( block_pointer)(char const, mach_o::DependentDylibAttributes, mach_o::Version32, mach_o::Version32, bool&)) const (in /usr/lib/dyld) ==92036== by 0x1040764CB: mach_o::Header::forEachLoadCommand(void ( block_pointer)(load_command const, bool&)) const (in /usr/lib/dyld) ==92036== by 0x104076993: mach_o::Header::forEachDependentDylib(void ( block_pointer)(char const, mach_o::DependentDylibAttributes, mach_o::Version32, mach_o::Version32, bool&)) const (in /usr/lib/dyld) ==92036== by 0x1040568EB: dyld4::JustInTimeLoader::loadDependents(Diagnostics&, dyld4::RuntimeState&, dyld4::Loader::LoadOptions const&) (in /usr/lib/dyld) ==92036== by 0x10403A88B: dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*) (in /usr/lib/dyld) ==92036== by 0x104039EF3: (below main) (in /usr/lib/dyld) client stack range: [0x3047FC000 0x304FF7FFF] client SP: 0x304FF5500 valgrind stack range: [0x7000016EC000 0x7000017EBFFF] top usage: 16160 of 1048576

Note: see also the FAQ in the source distribution. ... `

paulfloyd commented 4 days ago

What kind of binary is ref_speaker_curl_pump?

The 4k buffer on line 1164 of debuginfo.c might not be big enough. Can you try making it bigger?

I need to clean up VG_(di_notifymmap) and ML(check_macho_and_get_rwloads). ML(check_macho_and_get_rwloads) should be more like ML(check_elf_and_get_rwloads) taking the fd rather than relying on VG(di_notify_mmap) to read the start of the binary into a fixed size buffer.

pilotniq commented 4 days ago

Thanks for the response @paulfloyd !

What kind of binary is ref_speaker_curl_pump?

% file ref_speaker_curl_pump
ref_speaker_curl_pump: Mach-O 64-bit executable arm64

The main is a C program, compiled with clang: Apple clang version 15.0.0 (clang-1500.3.9.4)

It is linked with code written in Rust and C++.

The 4k buffer on line 1164 of debuginfo.c might not be big enough. Can you try making it bigger?

Thanks! That did change the behavior. If i increase the size to 16384 or 65536, there is an error later:

% ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump
==55757== Memcheck, a memory error detector
==55757== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==55757== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==55757== Command: ./ref_speaker_curl_pump
==55757== 
==55757== Warning: set address range perms: large range [0x700001c000, 0x7e00024000) (defined)
==55757== Warning: set address range perms: large range [0x27e00024000, 0x107000020000) (defined)
[ two minute delay here ]
==55757== Warning: set address range perms: large range [0x7e00024000, 0x27e00024000) (noaccess)
objc[55757]: realized class 0x1ea441fb0 has corrupt data pointer: malloc_size(0x10bb008e0) = 0
zsh: killed     ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump

In this invocation, ref_speaker_curl_pump will just look at the command line arguments (none), then detect that some environment variables are not set, and do an error exit, all within C code. But I guess it doesn't even get that far.

The C files were compiled with -gdwarf-4 (since the valgrind on our Linux CI environment doesn't support the most recent debug info format). Replacing -gdwarf-4 with just -g and rebuilding, I get a similar output, except there is a run: /usr/bin/dsymutil "./ref_speaker_curl_pump" line before the warnings:

% ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump     
==7945== Memcheck, a memory error detector
==7945== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==7945== Using Valgrind-3.24.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==7945== Command: ./ref_speaker_curl_pump
==7945== 
--7945-- run: /usr/bin/dsymutil "./ref_speaker_curl_pump"
==7945== Warning: set address range perms: large range [0x700001c000, 0x7e00024000) (defined)
==7945== Warning: set address range perms: large range [0x27e00024000, 0x107000020000) (defined)
==7945== Warning: set address range perms: large range [0x7e00024000, 0x27e00024000) (noaccess)
objc[7945]: realized class 0x1ea441fb0 has corrupt data pointer: malloc_size(0x10b4008e0) = 0
zsh: killed     ~/src/valgrind-macos/vg-in-place ./ref_speaker_curl_pump

The executable built with these -f flags: -fno-omit-frame-pointer -fsanitize=address -fsanitize=float-cast-overflow -fsanitize=float-divide-by-zero -fsanitize=undefined -fsanitize-address-use-after-scope

paulfloyd commented 4 days ago

Don't build your exe with sanitizers. Just -g and -fno-omit-frame-pointer are enough for Valgrind. Mixing sanitizers and Valgrind usually doesn't work.

Apple has done a lot of work hardening its allocators and deallocators recently using type-aware functions. I don't know if that has spilled over into malloc_size.