gnustep / libobjc2

Objective-C runtime library intended for use with Clang.
http://www.gnustep.org/
MIT License
442 stars 119 forks source link

Native Arm support on Windows #227

Closed triplef closed 12 months ago

triplef commented 2 years ago

With Microsoft announcing Arm-native developer toolsets at Build this year, we were wondering what it would take to get libobjc2 support this. Would this require any new implementations of the assembly, exception handling code, or compiler support? Or should the existing EH for Windows, Aarch64 implementation of objc_msgSend, and compiler support work in theory?

I realize probably no one has tried this, but it would be great to get a sense of what work would be involved to get this supported.

davidchisnall commented 2 years ago

I'd hope that the EH code would continue to work, though there might be some small changes to the unwind structure. It looks as if Windows uses the AAPCS on Arm, so the assembly should Just Work.

Hopefully there will be GitHub hosted Actions Runners for Windows/Arm soon, now that there are Arm machines in Azure, so we can add it to CI and see.

triplef commented 2 years ago

Thanks, that sounds promising! So you wouldn’t expect any required changes in LLVM/Clang?

davidchisnall commented 2 years ago

Shouldn't. I can't remember if clang will unconditionally generate objc_msgSend for Windows or if it's only on x86, it might need an extra check added in CGObjCGNU.cpp to prevent it falling back to the objc_msg_lookup path.

triplef commented 2 years ago

Great to hear, thanks! Since the CI for this is running on Azure (not GitHub Actions), it sounds like we could already give this a try.

hmelder commented 2 years ago

I have modified the tools-windows-msvc scripts to include the aarch64 windows triplet (currently in a separate branch) and tried to build the toolchain. The libobjc2 build fails while generating objc_msgSend.S:

<instantiation>:80:2: error: relocation variant :got: unsupported on COFF targets

 adrp x10, :got:SmallObjectClasses

 ^

<instantiation>:81:2: error: relocation variant :got_lo12: unsupported on COFF targets

 ldr x10, [x10, :got_lo12:SmallObjectClasses]

Compiler: LLVM 14.0.5 Windows on ARM Log: https://github.com/gnustep/libobjc2/files/9013632/out.txt .

hmelder commented 2 years ago

This is the LLVM change that errors out unsupported symbol locations on aarch64: https://www.mail-archive.com/llvm-branch-commits@lists.llvm.org/msg04763.html

hmelder commented 2 years ago

Related:

https://github.com/gnustep/libobjc2/issues/228#issuecomment-1195311843 https://github.com/gnustep/libobjc2/issues/228#issuecomment-1195320069

hmelder commented 2 years ago

Dynamic Address Relocation

After some consideration, I've come up with something that fixes the dynamic address relocation issue on Windows on ARM (WoA).

There is no Global Offset Table (GOT) in COFF that we can use to resolve the PC-relative offset/address of the symbol in position independent code. In PIC, the runtime loader is used (ld.so) to determine the address. The linker emits a dynamic relocation. The loader performs a symbol lookup to determine the associated symbol value at runtime.

ELF aarch64
adrp x9, :got: var
ldr x9, [x9, :got_lo12: var]
Mach-O aarch64
adrp    x9, _var@GOTPAGE
ldr x9, [x9, _var@GOTPAGEOFF]

I have generated some example assembly code (clang -S), yet the generated assembly uses a relocation scheme based on a fixed offset (load base + constant). GCC generates assembly with dynamic lookups, but is not available on WoA. However, this approach should work for a PIE or shared library as long as everything is located in one object file (Correct me if I'm wrong @davidchisnall ).

COFF aarch64
adrp    x9, var
ldr x9, [x9, :lo12:var]

.addrsig
.addrsig_sym var

The PE format is nicely documented, but lacks important details about loader interactions PE-Format Specification COFF Relocations (all supported COFF relocation types in LLVM RelocationTypesARM64).

Related Links: https://maskray.me/blog/2021-08-29-all-about-global-offset-table

Structured Exception Handling

Adding macros to define the platform-dependent exception handling can be done using ifdef _WIN64 and some abstraction (see objc_msgSend.x86-64.S)

I'll test this on my WoA VM later this day.

hmelder commented 2 years ago

I've patched the msgSend assembly, but there is a compiler crash when building the legacy GNU ABI protocol hack:

[24/27] Building C object CMakeFiles\objc.dir\Protocol2.m.obj
FAILED: CMakeFiles/objc.dir/Protocol2.m.obj
C:\LLVM-woa64\bin\clang-cl.exe --target=aarch64-pc-windows  /nologo -DCXA_ALLOCATE_EXCEPTION_SPECIFIER=noexcept -DGC_DEBUG -DGNUSTEP -DNO_LEGACY -DTYPE_DEPENDENT_DISPATCH -D__OBJC_RUNTIME_INTERNAL__=1 -Dobjc_EXPORTS  /DWIN32 /D_WINDOWS /W3 -Xclang -fexceptions -Xclang -fobjc-exceptions /EHas /Z7 -O0 -Xclang -fno-inline /MDd /Zi /Ob0 /Od /RTC1  -Wno-deprecated-objc-isa-usage -Wno-objc-root-class -fobjc-runtime=gnustep-2.0 -Xclang -x -Xclang objective-c /showIncludes /FoCMakeFiles\objc.dir\Protocol2.m.obj /FdCMakeFiles\objc.dir\ -c -- C:\tools-windows-msvc\src\libobjc2\Protocol2.m
### CCC_OVERRIDE_OPTIONS: x-TC x-TP x/TC x/TP
clang-cl: warning: argument unused during compilation: '-O0' [-Wunused-command-line-argument]
Assertion failed: cast<PointerType>(getOperand(1)->getType()) ->isOpaqueOrPointeeTypeMatches(getOperand(0)->getType()) && "Ptr must be a pointer to Val type!", file C:\tcwg-surface-06\ws\tdb0\llvm_package_14.0.5\llvm-project\llvm\lib\IR\Instructions.cpp, line 1490
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: C:\\LLVM-woa64\\bin\\clang-cl.exe --target=aarch64-pc-windows /nologo -DCXA_ALLOCATE_EXCEPTION_SPECIFIER=noexcept -DGC_DEBUG -DGNUSTEP -DNO_LEGACY -DTYPE_DEPENDENT_DISPATCH -D__OBJC_RUNTIME_INTERNAL__=1 -Dobjc_EXPORTS /DWIN32 /D_WINDOWS /W3 -Xclang -fexceptions -Xclang -fobjc-exceptions /EHas /Z7 -O0 -Xclang -fno-inline /MDd /Zi /Ob0 /Od /RTC1 -Wno-deprecated-objc-isa-usage -Wno-objc-root-class -fobjc-runtime=gnustep-2.0 -Xclang -x -Xclang objective-c /showIncludes /FoCMakeFiles\\objc.dir\\Protocol2.m.obj /FdCMakeFiles\\objc.dir\\ -c -- C:\\tools-windows-msvc\\src\\libobjc2\\Protocol2.m
1.      <eof> parser at end of file
2.      Per-file LLVM IR generation
 #0 0x00007ff7f3e2f344 (C:\LLVM-woa64\bin\clang-cl.exe+0x79f344)
 #1 0x00007ffe9dc45dc8 (C:\Windows\System32\ucrtbase.dll+0x75dc8)
 #2 0x00007ffe9dc46d7c (C:\Windows\System32\ucrtbase.dll+0x76d7c)
 #3 0x00007ffe9dc4859c (C:\Windows\System32\ucrtbase.dll+0x7859c)
 #4 0x00007ffe9dc48798 (C:\Windows\System32\ucrtbase.dll+0x78798)
 #5 0x00007ff7f42114a4 (C:\LLVM-woa64\bin\clang-cl.exe+0xb814a4)
 #6 0x00007ff7f3a22f94 (C:\LLVM-woa64\bin\clang-cl.exe+0x392f94)
 #7 0x00007ff7f609dbc8 (C:\LLVM-woa64\bin\clang-cl.exe+0x2a0dbc8)
 #8 0x00007ff7f503e024 (C:\LLVM-woa64\bin\clang-cl.exe+0x19ae024)
 #9 0x00007ff7f6bbb360 (C:\LLVM-woa64\bin\clang-cl.exe+0x352b360)
#10 0x00007ff7f5446154 (C:\LLVM-woa64\bin\clang-cl.exe+0x1db6154)
#11 0x00007ff7f6b3a96c (C:\LLVM-woa64\bin\clang-cl.exe+0x34aa96c)
#12 0x00007ff7f53bfbd8 (C:\LLVM-woa64\bin\clang-cl.exe+0x1d2fbd8)
#13 0x00007ff7f3f597f8 (C:\LLVM-woa64\bin\clang-cl.exe+0x8c97f8)
#14 0x00007ff7f3fd01c4 (C:\LLVM-woa64\bin\clang-cl.exe+0x9401c4)
#15 0x00007ff7f36967e4 (C:\LLVM-woa64\bin\clang-cl.exe+0x67e4)
#16 0x00007ff7f3694120 (C:\LLVM-woa64\bin\clang-cl.exe+0x4120)
#17 0x00007ff7f51b7f28 (C:\LLVM-woa64\bin\clang-cl.exe+0x1b27f28)
#18 0x00007ff7f3df4b60 (C:\LLVM-woa64\bin\clang-cl.exe+0x764b60)
#19 0x00007ff7f51b7b7c (C:\LLVM-woa64\bin\clang-cl.exe+0x1b27b7c)
#20 0x00007ff7f3f22d3c (C:\LLVM-woa64\bin\clang-cl.exe+0x892d3c)
#21 0x00007ff7f3f2314c (C:\LLVM-woa64\bin\clang-cl.exe+0x89314c)
#22 0x00007ff7f3f36cac (C:\LLVM-woa64\bin\clang-cl.exe+0x8a6cac)
#23 0x00007ff7f3693900 (C:\LLVM-woa64\bin\clang-cl.exe+0x3900)
#24 0x00007ff7f7c2a074 (C:\LLVM-woa64\bin\clang-cl.exe+0x459a074)
#25 0x00007ff7f7c2a100 (C:\LLVM-woa64\bin\clang-cl.exe+0x459a100)
#26 0x00007ffea1df1fa0 (C:\Windows\System32\KERNEL32.DLL+0x11fa0)
#27 0x00007ffea22c2bdc (C:\Windows\SYSTEM32\ntdll.dll+0x72bdc)
clang-cl: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 14.0.5
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\LLVM-woa64\bin

libobjc2-protocol2-crash-logs.zip

Stubbing it out fixes this crash, but linking the dll fails:

lld-link: error: undefined symbol: __clear_cache
>>> referenced by C:\tools-windows-msvc\src\libobjc2\block_to_imp.c:180
>>>               CMakeFiles\objc.dir\block_to_imp.c.obj:(alloc_trampolines)

lld-link: error: undefined symbol: __declspec(dllimport) RtlRaiseException
>>> referenced by C:\tools-windows-msvc\src\libobjc2\eh_win32_msvc.cc:196
>>>               CMakeFiles\objc.dir\eh_win32_msvc.cc.obj:(objc_exception_throw)
>>> referenced by C:\tools-windows-msvc\src\libobjc2\eh_win32_msvc.cc:196
>>>               CMakeFiles\objc.dir\eh_win32_msvc.cc.obj:(objc_exception_throw)
ninja: build stopped: subcommand failed.
davidchisnall commented 2 years ago

It looks as if the Windows spelling of __clear_cache is FlushInstructionCache. It should be possible to write a static function in block_to_imp.c that wraps this for compatibility. The second linking error looks like a simple missing DLL. The docs say that this comes from Ntdll.dll, which I thought was linked by default, but maybe isn't on Arm?

hmelder commented 2 years ago

The docs say that this comes from Ntdll.dll, which I thought was linked by default, but maybe isn't on Arm?

Thought about this too, but had no way of testing it. I’ll just try to explicitly link ntdll.

It looks as if the Windows spelling of __clear_cache is FlushInstructionCache. It should be possible to write a static function in block_to_imp.c that wraps this for compatibility.

Makes sense!

Thank you :)

hmelder commented 1 year ago

The project builds now, after some modifications to obj_msgSend (text relocation instead of GOT, add linker directives for PE/COFF), using FlushInstructionCache in block_to_imp.c, and linking ntdll.dll.

image

The objc_msgSend tests are still failing. I guess that is because I have not finished replacing all cfi directives with seh directives (conditionally ofc).

triplef commented 1 year ago

Very cool! Is there a branch with your modifications to check it out?

hmelder commented 1 year ago

Very cool! Is there a branch with your modifications to check it out?

It is a bit hacky right now :)

https://github.com/gnustep/libobjc2/tree/woa_support

anthony-linaro commented 1 year ago

@hmelder Did you ever finish looking at this? We've had a couple of requests for this library via partners, so I am investigating the feasibility.

hmelder commented 1 year ago

I am now actively working on it (started yesterday), and currently studying aarch64 assembly and SEH on WoA

On 13. Nov 2023, at 11:38, Anthony Roberts @.***> wrote:

@hmelder https://github.com/hmelder Did you ever finish looking at this? We've had a couple of requests for this library via partners, so I am investigating the feasibility.

— Reply to this email directly, view it on GitHub https://github.com/gnustep/libobjc2/issues/227#issuecomment-1807903335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK35JFDWXL5MLAQTVYY3XHLYEH2CVAVCNFSM5XUTIG62U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBQG44TAMZTGM2Q. You are receiving this because you were mentioned.

anthony-linaro commented 1 year ago

Okay, great! Feel free to get in contact with me (email in profile) if you have any particularly difficult issues, or run into toolchain problems

hmelder commented 1 year ago

Update:

The aarch64 msgSend implementation is now working but unwinding fails, as I have not translated all CFI directives to the corresponding SEH ones.

Checking out woa_support and applying the following patch to the test which disables the exception test:

diff --git a/Test/objc_msgSend.m b/Test/objc_msgSend.m
index 4689172..49dd8da 100644
--- a/Test/objc_msgSend.m
+++ b/Test/objc_msgSend.m
@@ -91,7 +91,7 @@ __attribute__((objc_root_class))
 + (void)initialize
 {
        [self printf: "Format %s %d %f%c", "string", 42, 42.0, '\n'];
-       @throw self;
+       //@throw self;
 }
 + nothing { return 0; }
 @end
@@ -179,6 +179,7 @@ int main(void)
        __objc_msg_forward3 = forward_slot;
        TestCls = objc_getClass("MsgTest");
        int exceptionThrown = 0;
+  /*
        @try {
                objc_msgSend(TestCls, @selector(foo));
        } @catch (id e)
@@ -187,6 +188,7 @@ int main(void)
                exceptionThrown = 1;
        }
        assert(exceptionThrown && "An exception was thrown");
+  */
        assert((id)0x42 == objc_msgSend(TestCls, @selector(foo)));
        objc_msgSend(TestCls, @selector(nothing));
        objc_msgSend(TestCls, @selector(missing));

Results in:

image

Sadly, SEH directives are not documented by MS. I was able to get an intuition for it by letting clang output assembly, and from this mailing list post: https://sourceware.org/legacy-ml/binutils/2009-08/msg00193.html

@anthony-linaro are you familiar with exception handling on Windows and how to translate the CFI directives properly?

https://github.com/gnustep/libobjc2/blob/1d2e52e4cbd607fcf05c026b5a5fc2f31f5a65ff/objc_msgSend.aarch64.S#L75

hmelder commented 1 year ago

The clang backend seems to be very sensitive about SEH directives. I keep hitting an issue where the length of the function can't be determined:

clang -cc1as: fatal error: error in backend: Failed to evaluate function length in SEH unwind info

This error originates from MCWin64EH.cpp#L298

The only similar issue I found is from a recent bug report: https://discourse.llvm.org/t/why-is-lldb-not-showing-debug-info-for-my-assembly-file/65412

hmelder commented 1 year ago

This is with the directives added in this commit: https://github.com/gnustep/libobjc2/commit/bac40ba0d2e7f19e78f2c7d50bd36d3c24684e34

anthony-linaro commented 1 year ago

@zacwalk is the person to ask here, I think! I have sent an email to him with a link to this issue.

hmelder commented 1 year ago

Perfekt. Thank you :)

omjavaid commented 1 year ago

@mstorsjo @compnerd any suggestions or directions on above SEH issue? Thanks in advance.

mstorsjo commented 1 year ago

The clang backend seems to be very sensitive about SEH directives. I keep hitting an issue where the length of the function can't be determined:

clang -cc1as: fatal error: error in backend: Failed to evaluate function length in SEH unwind info

This normally appears if there's some aspect of the instruction sequence which can't be measured immediately. In most cases, this can happen if there's some align directive in a function; the SEH unwind info needs to be created at a stage when sizes/layouts/alignments haven't been settled in the LLVM assembler yet.

This is with the directives added in this commit: bac40ba

I see a couple of .align 2 here, further up in the function, I'm pretty sure you'd avoid this issue if you'd omit those.

Sadly, SEH directives are not documented by MS. I was able to get an intuition for it by letting clang output assembly, and from this mailing list post: https://sourceware.org/legacy-ml/binutils/2009-08/msg00193.html

Indeed, although that one is for the x86_64 SEH format, which is kinda different from the ARM/ARM64 ones. I recommend reading https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling for an overall picture of how it works, then https://github.com/llvm/llvm-project/commit/5b86d130e2baed7221b09087c506f5974fe65f22 probably is the primary "reference" for the basic set of directives on AArch64. (A couple more have been added afterwards, but they're only relevant for very special cases.) Looking at the output from Clang certainly is a good way to go.

One primary difference to the x86_64 form of SEH, is that each function has one prologue and zero or more epilogues. Each of these (prologue, epilogue) are tightly packed; there's exactly one SEH directive for each instruction in the prologue/epilogue regions. On x86_64, the SEH opcodes encode the distance from the start of the function, for that directive, but for ARM/ARM64, the SEH opcodes don't encode any offsets, but each one is assumed to correspond to one instruction. Thus, from .seh_proc up until .seh_endprologue, there needs to be an 1:1 mapping between SEH directives and instructions. For instructions that are irrelevant for the unwinding, you can add .seh_nop.

Since Clang 16, Clang produces errors if there are mismatches between the count of instructions and opcodes in prologue/epilogues, see https://github.com/llvm/llvm-project/commit/cbd8464595220b5ea76c70ac9965d84970c4b712.

In this case, it looks like the function has a huge amount of instructions before the parts that actually are relevant for unwinding. I'm not sure what the best way to deal with this would be; either fill in with a huge amount of .seh_nop, or perhaps place the .seh_proc in the middle of the function, for a separate label, so the long start of the function is omitted from the area covered by the unwind info, if we don't really expect to unwind from there anyway. Then you need an .seh_endprologue at the end of it. If you really want to map the .seh_save_fplr 208 and .seh_stackalloc for the inverse forms (where save_fplr actually restores it, not saves it, and .seh_stackalloc is for incrementing the stack), it needs to be in a .seh_startepilogue/. seh_endepilogue range.

mstorsjo commented 1 year ago

Also, some minor comments on earlier posts here:

I have generated some example assembly code (clang -S), yet the generated assembly uses a relocation scheme based on a fixed offset (load base + constant). GCC generates assembly with dynamic lookups, but is not available on WoA. However, this approach should work for a PIE or shared library as long as everything is located in one object file (Correct me if I'm wrong @davidchisnall ).

COFF aarch64
adrp  x9, var
ldr   x9, [x9, :lo12:var]

.addrsig
.addrsig_sym var

The project builds now, after some modifications to obj_msgSend (text relocation instead of GOT

Within PEs, this isn't a text relocation, as var is located within the same PE image, so after linking, the offset will always be constant.

When referencing data in another DLL, it gets referenced indirectly via a symbol __imp_var from the Import Address Table (IAT), where the loader has filled in the actual address in the IAT entry:

extern int var;
extern __declspec(dllimport) int var2;
int get(void) {
  return var;
}
int get2(void) {
  return var2;
}
$ clang -target aarch64-windows -S -O2 load.c -o -
get:
        adrp    x8, var
        ldr     w0, [x8, :lo12:var]
        ret

get2:
        adrp    x8, __imp_var2
        ldr     x8, [x8, :lo12:__imp_var2]
        ldr     w0, [x8]
        ret

In both cases, either var or __imp_var are located at a fixed offset within the same image.

The corresponding version of get for aarch64-linux, with GOT references, looks like this:

get:
        adrp    x8, :got:var
        ldr     x8, [x8, :got_lo12:var]
        ldr     w0, [x8]
        ret

I.e. the GOT relative uses are equivalent to __imp_ references to the IAT, which are used when symbols are marked as dllimport.

ZacWalk commented 1 year ago

I have been working on a SEH implementation for GCC here.

Apart from pdata/xdata generation, on aarch64 there seems to be something different with the establisher frame in RtlUnwindEx and RaiseException API’s. This affects EH in GCC because it is unable to hit landing pads correctly. Just mentioning it here as you might hit that problem. If I work out what is different with those APIs I will feed back here.

I did notice that EH in CLANG looked to use the UCRT handlers. Maybe this won’t be a problem for any LLVM based projects. GCC has its own EH.

hmelder commented 1 year ago

@mstorsjo thank you for this detailed explaination. This explains the behaviour I have seem when omitting the prologue and/or epilogue, or altering op codes in them.

I will try to get an intuition for annotating the SEH directives by hand on arm64, as they seem to be quite delicate :)

Regarding your second comment, I was not aware of the __IMP scheme and IAT in January, but already implemented it last week with the IAT in mind (As the symbol is in the same PE image, IAT access was not needed).

hmelder commented 1 year ago

If I work out what is different with those APIs I will feed back here

That would be great!

mstorsjo commented 1 year ago

I did notice that EH in CLANG looked to use the UCRT handlers. Maybe this won’t be a problem for any LLVM based projects. GCC has its own EH.

I'm not sure which details you're referring to here? Clang can operate either in MSVC mode or mingw mode. In MSVC mode it uses the same things as MSVC does. In mingw mode, it uses either libgcc or LLVM's libunwind for exception handling, together with libcxxabi (which should be functionally equivalent to libstdc++/libsupc++). Clang in mingw mode works just as well on top of msvcrt as on top of UCRT.

Before LLVM's libunwind supported SEH, I actually was using libgcc's unwind implementation here, and I had that patched up for aarch64 at some point, see https://martin.st/temp/0001-Patch-unwind-seh.c-to-handle-aarch64-in-addition-to-.patch (although I think I switched from libgcc to LLVM's libunwind for SEH before switching from DWARF to SEH on aarch64 properly, so it might not have been fully tested).

The corresponding patch for LLVM's libunwind, to extend the SEH implementation to aarch64, was roughly similarly straightforward, with a bit more boilerplate to handle: https://github.com/llvm/llvm-project/commit/09cf6374c162b13e00bb86c10e6e481abf437a07

Apart from pdata/xdata generation, on aarch64 there seems to be something different with the establisher frame in RtlUnwindEx and RaiseException API’s. This affects EH in GCC because it is unable to hit landing pads correctly. Just mentioning it here as you might hit that problem. If I work out what is different with those APIs I will feed back here.

IIRC, on ARM/AArch64 the "establisher frame" is the value of SP on entry to the function - which differs from what it was on x86_64. I don't remember needing to worry about this distinction within libunwind though but I presume it's required somewhere in the code generation for the landing pads? Within libgcc/libunwind, this value mostly get passed through as-is from the parameter as given to _GCC_specific_handler passed on as the first parameter to RtlUnwindEx.

In the case of setjmp/longjmp, when using the msvcrt/UCRT implementations of these, which use RtlUnwindEx internally, we use a new ARM/AArch64 specific builtin __builtin_sponentry() to get the correct frame value to use here, see https://github.com/mingw-w64/mingw-w64/blob/v11.0.1/mingw-w64-headers/crt/setjmp.h#L234, where it used __builtin_frame_address(0) on x86_64.

hmelder commented 1 year ago

See #249