Closed triplef closed 12 months ago
I'd hope that the EH code would continue to work, though there might be some small changes to the unwind structure. It looks as if Windows uses the AAPCS on Arm, so the assembly should Just Work.
Hopefully there will be GitHub hosted Actions Runners for Windows/Arm soon, now that there are Arm machines in Azure, so we can add it to CI and see.
Thanks, that sounds promising! So you wouldn’t expect any required changes in LLVM/Clang?
Shouldn't. I can't remember if clang will unconditionally generate objc_msgSend for Windows or if it's only on x86, it might need an extra check added in CGObjCGNU.cpp to prevent it falling back to the objc_msg_lookup path.
Great to hear, thanks! Since the CI for this is running on Azure (not GitHub Actions), it sounds like we could already give this a try.
I have modified the tools-windows-msvc scripts to include the aarch64 windows triplet (currently in a separate branch) and tried to build the toolchain. The libobjc2 build fails while generating objc_msgSend.S:
<instantiation>:80:2: error: relocation variant :got: unsupported on COFF targets
adrp x10, :got:SmallObjectClasses
^
<instantiation>:81:2: error: relocation variant :got_lo12: unsupported on COFF targets
ldr x10, [x10, :got_lo12:SmallObjectClasses]
Compiler: LLVM 14.0.5 Windows on ARM Log: https://github.com/gnustep/libobjc2/files/9013632/out.txt .
This is the LLVM change that errors out unsupported symbol locations on aarch64: https://www.mail-archive.com/llvm-branch-commits@lists.llvm.org/msg04763.html
After some consideration, I've come up with something that fixes the dynamic address relocation issue on Windows on ARM (WoA).
There is no Global Offset Table (GOT) in COFF that we can use to resolve the PC-relative offset/address of the symbol in position independent code. In PIC, the runtime loader is used (ld.so) to determine the address. The linker emits a dynamic relocation. The loader performs a symbol lookup to determine the associated symbol value at runtime.
adrp x9, :got: var
ldr x9, [x9, :got_lo12: var]
adrp x9, _var@GOTPAGE
ldr x9, [x9, _var@GOTPAGEOFF]
I have generated some example assembly code (clang -S), yet the generated assembly uses a relocation scheme based on a fixed offset (load base + constant). GCC generates assembly with dynamic lookups, but is not available on WoA. However, this approach should work for a PIE or shared library as long as everything is located in one object file (Correct me if I'm wrong @davidchisnall ).
adrp x9, var
ldr x9, [x9, :lo12:var]
.addrsig
.addrsig_sym var
The PE format is nicely documented, but lacks important details about loader interactions PE-Format Specification COFF Relocations (all supported COFF relocation types in LLVM RelocationTypesARM64).
Related Links: https://maskray.me/blog/2021-08-29-all-about-global-offset-table
Adding macros to define the platform-dependent exception handling can be done using ifdef _WIN64 and some abstraction (see objc_msgSend.x86-64.S)
I'll test this on my WoA VM later this day.
I've patched the msgSend assembly, but there is a compiler crash when building the legacy GNU ABI protocol hack:
[24/27] Building C object CMakeFiles\objc.dir\Protocol2.m.obj
FAILED: CMakeFiles/objc.dir/Protocol2.m.obj
C:\LLVM-woa64\bin\clang-cl.exe --target=aarch64-pc-windows /nologo -DCXA_ALLOCATE_EXCEPTION_SPECIFIER=noexcept -DGC_DEBUG -DGNUSTEP -DNO_LEGACY -DTYPE_DEPENDENT_DISPATCH -D__OBJC_RUNTIME_INTERNAL__=1 -Dobjc_EXPORTS /DWIN32 /D_WINDOWS /W3 -Xclang -fexceptions -Xclang -fobjc-exceptions /EHas /Z7 -O0 -Xclang -fno-inline /MDd /Zi /Ob0 /Od /RTC1 -Wno-deprecated-objc-isa-usage -Wno-objc-root-class -fobjc-runtime=gnustep-2.0 -Xclang -x -Xclang objective-c /showIncludes /FoCMakeFiles\objc.dir\Protocol2.m.obj /FdCMakeFiles\objc.dir\ -c -- C:\tools-windows-msvc\src\libobjc2\Protocol2.m
### CCC_OVERRIDE_OPTIONS: x-TC x-TP x/TC x/TP
clang-cl: warning: argument unused during compilation: '-O0' [-Wunused-command-line-argument]
Assertion failed: cast<PointerType>(getOperand(1)->getType()) ->isOpaqueOrPointeeTypeMatches(getOperand(0)->getType()) && "Ptr must be a pointer to Val type!", file C:\tcwg-surface-06\ws\tdb0\llvm_package_14.0.5\llvm-project\llvm\lib\IR\Instructions.cpp, line 1490
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: C:\\LLVM-woa64\\bin\\clang-cl.exe --target=aarch64-pc-windows /nologo -DCXA_ALLOCATE_EXCEPTION_SPECIFIER=noexcept -DGC_DEBUG -DGNUSTEP -DNO_LEGACY -DTYPE_DEPENDENT_DISPATCH -D__OBJC_RUNTIME_INTERNAL__=1 -Dobjc_EXPORTS /DWIN32 /D_WINDOWS /W3 -Xclang -fexceptions -Xclang -fobjc-exceptions /EHas /Z7 -O0 -Xclang -fno-inline /MDd /Zi /Ob0 /Od /RTC1 -Wno-deprecated-objc-isa-usage -Wno-objc-root-class -fobjc-runtime=gnustep-2.0 -Xclang -x -Xclang objective-c /showIncludes /FoCMakeFiles\\objc.dir\\Protocol2.m.obj /FdCMakeFiles\\objc.dir\\ -c -- C:\\tools-windows-msvc\\src\\libobjc2\\Protocol2.m
1. <eof> parser at end of file
2. Per-file LLVM IR generation
#0 0x00007ff7f3e2f344 (C:\LLVM-woa64\bin\clang-cl.exe+0x79f344)
#1 0x00007ffe9dc45dc8 (C:\Windows\System32\ucrtbase.dll+0x75dc8)
#2 0x00007ffe9dc46d7c (C:\Windows\System32\ucrtbase.dll+0x76d7c)
#3 0x00007ffe9dc4859c (C:\Windows\System32\ucrtbase.dll+0x7859c)
#4 0x00007ffe9dc48798 (C:\Windows\System32\ucrtbase.dll+0x78798)
#5 0x00007ff7f42114a4 (C:\LLVM-woa64\bin\clang-cl.exe+0xb814a4)
#6 0x00007ff7f3a22f94 (C:\LLVM-woa64\bin\clang-cl.exe+0x392f94)
#7 0x00007ff7f609dbc8 (C:\LLVM-woa64\bin\clang-cl.exe+0x2a0dbc8)
#8 0x00007ff7f503e024 (C:\LLVM-woa64\bin\clang-cl.exe+0x19ae024)
#9 0x00007ff7f6bbb360 (C:\LLVM-woa64\bin\clang-cl.exe+0x352b360)
#10 0x00007ff7f5446154 (C:\LLVM-woa64\bin\clang-cl.exe+0x1db6154)
#11 0x00007ff7f6b3a96c (C:\LLVM-woa64\bin\clang-cl.exe+0x34aa96c)
#12 0x00007ff7f53bfbd8 (C:\LLVM-woa64\bin\clang-cl.exe+0x1d2fbd8)
#13 0x00007ff7f3f597f8 (C:\LLVM-woa64\bin\clang-cl.exe+0x8c97f8)
#14 0x00007ff7f3fd01c4 (C:\LLVM-woa64\bin\clang-cl.exe+0x9401c4)
#15 0x00007ff7f36967e4 (C:\LLVM-woa64\bin\clang-cl.exe+0x67e4)
#16 0x00007ff7f3694120 (C:\LLVM-woa64\bin\clang-cl.exe+0x4120)
#17 0x00007ff7f51b7f28 (C:\LLVM-woa64\bin\clang-cl.exe+0x1b27f28)
#18 0x00007ff7f3df4b60 (C:\LLVM-woa64\bin\clang-cl.exe+0x764b60)
#19 0x00007ff7f51b7b7c (C:\LLVM-woa64\bin\clang-cl.exe+0x1b27b7c)
#20 0x00007ff7f3f22d3c (C:\LLVM-woa64\bin\clang-cl.exe+0x892d3c)
#21 0x00007ff7f3f2314c (C:\LLVM-woa64\bin\clang-cl.exe+0x89314c)
#22 0x00007ff7f3f36cac (C:\LLVM-woa64\bin\clang-cl.exe+0x8a6cac)
#23 0x00007ff7f3693900 (C:\LLVM-woa64\bin\clang-cl.exe+0x3900)
#24 0x00007ff7f7c2a074 (C:\LLVM-woa64\bin\clang-cl.exe+0x459a074)
#25 0x00007ff7f7c2a100 (C:\LLVM-woa64\bin\clang-cl.exe+0x459a100)
#26 0x00007ffea1df1fa0 (C:\Windows\System32\KERNEL32.DLL+0x11fa0)
#27 0x00007ffea22c2bdc (C:\Windows\SYSTEM32\ntdll.dll+0x72bdc)
clang-cl: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 14.0.5
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\LLVM-woa64\bin
libobjc2-protocol2-crash-logs.zip
Stubbing it out fixes this crash, but linking the dll fails:
lld-link: error: undefined symbol: __clear_cache
>>> referenced by C:\tools-windows-msvc\src\libobjc2\block_to_imp.c:180
>>> CMakeFiles\objc.dir\block_to_imp.c.obj:(alloc_trampolines)
lld-link: error: undefined symbol: __declspec(dllimport) RtlRaiseException
>>> referenced by C:\tools-windows-msvc\src\libobjc2\eh_win32_msvc.cc:196
>>> CMakeFiles\objc.dir\eh_win32_msvc.cc.obj:(objc_exception_throw)
>>> referenced by C:\tools-windows-msvc\src\libobjc2\eh_win32_msvc.cc:196
>>> CMakeFiles\objc.dir\eh_win32_msvc.cc.obj:(objc_exception_throw)
ninja: build stopped: subcommand failed.
It looks as if the Windows spelling of __clear_cache
is FlushInstructionCache
. It should be possible to write a static function in block_to_imp.c
that wraps this for compatibility. The second linking error looks like a simple missing DLL. The docs say that this comes from Ntdll.dll, which I thought was linked by default, but maybe isn't on Arm?
The docs say that this comes from Ntdll.dll, which I thought was linked by default, but maybe isn't on Arm?
Thought about this too, but had no way of testing it. I’ll just try to explicitly link ntdll.
It looks as if the Windows spelling of __clear_cache is FlushInstructionCache. It should be possible to write a static function in block_to_imp.c that wraps this for compatibility.
Makes sense!
Thank you :)
The project builds now, after some modifications to obj_msgSend (text relocation instead of GOT, add linker directives for PE/COFF), using FlushInstructionCache in block_to_imp.c, and linking ntdll.dll.
The objc_msgSend tests are still failing. I guess that is because I have not finished replacing all cfi directives with seh directives (conditionally ofc).
Very cool! Is there a branch with your modifications to check it out?
Very cool! Is there a branch with your modifications to check it out?
It is a bit hacky right now :)
@hmelder Did you ever finish looking at this? We've had a couple of requests for this library via partners, so I am investigating the feasibility.
I am now actively working on it (started yesterday), and currently studying aarch64 assembly and SEH on WoA
On 13. Nov 2023, at 11:38, Anthony Roberts @.***> wrote:
@hmelder https://github.com/hmelder Did you ever finish looking at this? We've had a couple of requests for this library via partners, so I am investigating the feasibility.
— Reply to this email directly, view it on GitHub https://github.com/gnustep/libobjc2/issues/227#issuecomment-1807903335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK35JFDWXL5MLAQTVYY3XHLYEH2CVAVCNFSM5XUTIG62U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBQG44TAMZTGM2Q. You are receiving this because you were mentioned.
Okay, great! Feel free to get in contact with me (email in profile) if you have any particularly difficult issues, or run into toolchain problems
Update:
The aarch64 msgSend implementation is now working but unwinding fails, as I have not translated all CFI directives to the corresponding SEH ones.
Checking out woa_support
and applying the following patch to the test which disables the exception test:
diff --git a/Test/objc_msgSend.m b/Test/objc_msgSend.m
index 4689172..49dd8da 100644
--- a/Test/objc_msgSend.m
+++ b/Test/objc_msgSend.m
@@ -91,7 +91,7 @@ __attribute__((objc_root_class))
+ (void)initialize
{
[self printf: "Format %s %d %f%c", "string", 42, 42.0, '\n'];
- @throw self;
+ //@throw self;
}
+ nothing { return 0; }
@end
@@ -179,6 +179,7 @@ int main(void)
__objc_msg_forward3 = forward_slot;
TestCls = objc_getClass("MsgTest");
int exceptionThrown = 0;
+ /*
@try {
objc_msgSend(TestCls, @selector(foo));
} @catch (id e)
@@ -187,6 +188,7 @@ int main(void)
exceptionThrown = 1;
}
assert(exceptionThrown && "An exception was thrown");
+ */
assert((id)0x42 == objc_msgSend(TestCls, @selector(foo)));
objc_msgSend(TestCls, @selector(nothing));
objc_msgSend(TestCls, @selector(missing));
Results in:
Sadly, SEH directives are not documented by MS. I was able to get an intuition for it by letting clang output assembly, and from this mailing list post: https://sourceware.org/legacy-ml/binutils/2009-08/msg00193.html
@anthony-linaro are you familiar with exception handling on Windows and how to translate the CFI directives properly?
The clang backend seems to be very sensitive about SEH directives. I keep hitting an issue where the length of the function can't be determined:
clang -cc1as: fatal error: error in backend: Failed to evaluate function length in SEH unwind info
This error originates from MCWin64EH.cpp#L298
The only similar issue I found is from a recent bug report: https://discourse.llvm.org/t/why-is-lldb-not-showing-debug-info-for-my-assembly-file/65412
This is with the directives added in this commit: https://github.com/gnustep/libobjc2/commit/bac40ba0d2e7f19e78f2c7d50bd36d3c24684e34
@zacwalk is the person to ask here, I think! I have sent an email to him with a link to this issue.
Perfekt. Thank you :)
@mstorsjo @compnerd any suggestions or directions on above SEH issue? Thanks in advance.
The clang backend seems to be very sensitive about SEH directives. I keep hitting an issue where the length of the function can't be determined:
clang -cc1as: fatal error: error in backend: Failed to evaluate function length in SEH unwind info
This normally appears if there's some aspect of the instruction sequence which can't be measured immediately. In most cases, this can happen if there's some align directive in a function; the SEH unwind info needs to be created at a stage when sizes/layouts/alignments haven't been settled in the LLVM assembler yet.
This is with the directives added in this commit: bac40ba
I see a couple of .align 2
here, further up in the function, I'm pretty sure you'd avoid this issue if you'd omit those.
Sadly, SEH directives are not documented by MS. I was able to get an intuition for it by letting clang output assembly, and from this mailing list post: https://sourceware.org/legacy-ml/binutils/2009-08/msg00193.html
Indeed, although that one is for the x86_64 SEH format, which is kinda different from the ARM/ARM64 ones. I recommend reading https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling for an overall picture of how it works, then https://github.com/llvm/llvm-project/commit/5b86d130e2baed7221b09087c506f5974fe65f22 probably is the primary "reference" for the basic set of directives on AArch64. (A couple more have been added afterwards, but they're only relevant for very special cases.) Looking at the output from Clang certainly is a good way to go.
One primary difference to the x86_64 form of SEH, is that each function has one prologue and zero or more epilogues. Each of these (prologue, epilogue) are tightly packed; there's exactly one SEH directive for each instruction in the prologue/epilogue regions. On x86_64, the SEH opcodes encode the distance from the start of the function, for that directive, but for ARM/ARM64, the SEH opcodes don't encode any offsets, but each one is assumed to correspond to one instruction. Thus, from .seh_proc
up until .seh_endprologue
, there needs to be an 1:1 mapping between SEH directives and instructions. For instructions that are irrelevant for the unwinding, you can add .seh_nop
.
Since Clang 16, Clang produces errors if there are mismatches between the count of instructions and opcodes in prologue/epilogues, see https://github.com/llvm/llvm-project/commit/cbd8464595220b5ea76c70ac9965d84970c4b712.
In this case, it looks like the function has a huge amount of instructions before the parts that actually are relevant for unwinding. I'm not sure what the best way to deal with this would be; either fill in with a huge amount of .seh_nop
, or perhaps place the .seh_proc
in the middle of the function, for a separate label, so the long start of the function is omitted from the area covered by the unwind info, if we don't really expect to unwind from there anyway. Then you need an .seh_endprologue
at the end of it. If you really want to map the .seh_save_fplr 208
and .seh_stackalloc
for the inverse forms (where save_fplr
actually restores it, not saves it, and .seh_stackalloc
is for incrementing the stack), it needs to be in a .seh_startepilogue
/. seh_endepilogue
range.
Also, some minor comments on earlier posts here:
I have generated some example assembly code (clang -S), yet the generated assembly uses a relocation scheme based on a fixed offset (load base + constant). GCC generates assembly with dynamic lookups, but is not available on WoA. However, this approach should work for a PIE or shared library as long as everything is located in one object file (Correct me if I'm wrong @davidchisnall ).
COFF aarch64
adrp x9, var ldr x9, [x9, :lo12:var] .addrsig .addrsig_sym var
The project builds now, after some modifications to obj_msgSend (text relocation instead of GOT
Within PEs, this isn't a text relocation, as var
is located within the same PE image, so after linking, the offset will always be constant.
When referencing data in another DLL, it gets referenced indirectly via a symbol __imp_var
from the Import Address Table (IAT), where the loader has filled in the actual address in the IAT entry:
extern int var;
extern __declspec(dllimport) int var2;
int get(void) {
return var;
}
int get2(void) {
return var2;
}
$ clang -target aarch64-windows -S -O2 load.c -o -
get:
adrp x8, var
ldr w0, [x8, :lo12:var]
ret
get2:
adrp x8, __imp_var2
ldr x8, [x8, :lo12:__imp_var2]
ldr w0, [x8]
ret
In both cases, either var
or __imp_var
are located at a fixed offset within the same image.
The corresponding version of get
for aarch64-linux, with GOT references, looks like this:
get:
adrp x8, :got:var
ldr x8, [x8, :got_lo12:var]
ldr w0, [x8]
ret
I.e. the GOT relative uses are equivalent to __imp_
references to the IAT, which are used when symbols are marked as dllimport.
I have been working on a SEH implementation for GCC here.
Apart from pdata/xdata generation, on aarch64 there seems to be something different with the establisher frame in RtlUnwindEx and RaiseException API’s. This affects EH in GCC because it is unable to hit landing pads correctly. Just mentioning it here as you might hit that problem. If I work out what is different with those APIs I will feed back here.
I did notice that EH in CLANG looked to use the UCRT handlers. Maybe this won’t be a problem for any LLVM based projects. GCC has its own EH.
@mstorsjo thank you for this detailed explaination. This explains the behaviour I have seem when omitting the prologue and/or epilogue, or altering op codes in them.
I will try to get an intuition for annotating the SEH directives by hand on arm64, as they seem to be quite delicate :)
Regarding your second comment, I was not aware of the __IMP scheme and IAT in January, but already implemented it last week with the IAT in mind (As the symbol is in the same PE image, IAT access was not needed).
If I work out what is different with those APIs I will feed back here
That would be great!
I did notice that EH in CLANG looked to use the UCRT handlers. Maybe this won’t be a problem for any LLVM based projects. GCC has its own EH.
I'm not sure which details you're referring to here? Clang can operate either in MSVC mode or mingw mode. In MSVC mode it uses the same things as MSVC does. In mingw mode, it uses either libgcc or LLVM's libunwind for exception handling, together with libcxxabi (which should be functionally equivalent to libstdc++/libsupc++). Clang in mingw mode works just as well on top of msvcrt as on top of UCRT.
Before LLVM's libunwind supported SEH, I actually was using libgcc's unwind implementation here, and I had that patched up for aarch64 at some point, see https://martin.st/temp/0001-Patch-unwind-seh.c-to-handle-aarch64-in-addition-to-.patch (although I think I switched from libgcc to LLVM's libunwind for SEH before switching from DWARF to SEH on aarch64 properly, so it might not have been fully tested).
The corresponding patch for LLVM's libunwind, to extend the SEH implementation to aarch64, was roughly similarly straightforward, with a bit more boilerplate to handle: https://github.com/llvm/llvm-project/commit/09cf6374c162b13e00bb86c10e6e481abf437a07
Apart from pdata/xdata generation, on aarch64 there seems to be something different with the establisher frame in RtlUnwindEx and RaiseException API’s. This affects EH in GCC because it is unable to hit landing pads correctly. Just mentioning it here as you might hit that problem. If I work out what is different with those APIs I will feed back here.
IIRC, on ARM/AArch64 the "establisher frame" is the value of SP on entry to the function - which differs from what it was on x86_64. I don't remember needing to worry about this distinction within libunwind though but I presume it's required somewhere in the code generation for the landing pads? Within libgcc/libunwind, this value mostly get passed through as-is from the parameter as given to _GCC_specific_handler
passed on as the first parameter to RtlUnwindEx
.
In the case of setjmp/longjmp, when using the msvcrt/UCRT implementations of these, which use RtlUnwindEx
internally, we use a new ARM/AArch64 specific builtin __builtin_sponentry()
to get the correct frame value to use here, see https://github.com/mingw-w64/mingw-w64/blob/v11.0.1/mingw-w64-headers/crt/setjmp.h#L234, where it used __builtin_frame_address(0)
on x86_64.
See #249
With Microsoft announcing Arm-native developer toolsets at Build this year, we were wondering what it would take to get libobjc2 support this. Would this require any new implementations of the assembly, exception handling code, or compiler support? Or should the existing EH for Windows, Aarch64 implementation of objc_msgSend, and compiler support work in theory?
I realize probably no one has tried this, but it would be great to get a sense of what work would be involved to get this supported.