bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.43k stars 1.31k forks source link

Windows AARCH64 Target #8247

Open andrewmd5 opened 8 months ago

andrewmd5 commented 8 months ago

Feature

Compile the C API and produce ARM64 builds for Windows

Benefit

Will unblock this issue in the wasmtime-dotnet package

Not sure if the wheel exist for this yet; in terms of infrastructure cross compilation on Github Actions should be possible, but if necessary I’m happy to deploy some new self-hosted runners to power this feature.

cfallin commented 8 months ago

This depends on #4992 -- a little bit of core runtime functionality (trap handling, unwind info generation) necessary for this OS/architecture pair. If you're willing to work on this, we'd be happy to review a PR!

andrewmd5 commented 8 months ago

This depends on #4992 -- a little bit of core runtime functionality (trap handling, unwind info generation) necessary for this OS/architecture pair. If you're willing to work on this, we'd be happy to review a PR!

Thank you for linking the relevant issue; I’ll take a look and see about submitting a PR.

dpaoliello commented 1 month ago

I've been trying to get rustc_codegen_cranelift working with Windows ARM64, but I've run into an issue with alignment.

My work-in-progress branch is available at https://github.com/dpaoliello/wasmtime/tree/arm64wip

Current, when I run y build, a bunch of linker errors from link.exe complaining about the alignment of symbols:

std-0891cada1b439ffb.dq2m4pkr5rfjx0xrog6ne1sfh.rcgu.o : error LNK2048: relocation PAGEOFFSET_12L targeting 'memcpy' (0019EE84) is invalid for the instruction (F9400084 at RVA 000D6C84) at section 0x1 offset 0x137C, due to bad alignment of offset to target (E84); expected to be 8 bytes aligned

Import thing from that output is that memcpy is being located at 0019EE84, which is not 8 byte aligned.

I've tried setting the function alignment to 8: https://github.com/dpaoliello/wasmtime/blob/e7184160fe909864c767177de17729f876c0da60/cranelift/codegen/src/isa/aarch64/inst/mod.rs#L1184

And the symbol alignment to 8: https://github.com/dpaoliello/wasmtime/blob/e7184160fe909864c767177de17729f876c0da60/cranelift/codegen/src/isa/mod.rs#L430

But neither seems to have fixed this - any idea what I'm doing wrong?

alexcrichton commented 1 month ago

I think memcpy would be defined in the libc-equivalent-windows-has, which might be why changing Cranelift's function/symbol alignment didn't work? How sure are you the memcpy function itself is created by Cranelift?

If it's not created by Cranelift this might be something where we're generating the wrong relocation against memcpy perhaps? Where the one we're generating requires 8-byte alignment but we should be using something else that doesn't require 8-byte alignment?

dpaoliello commented 1 month ago

It's not memcpy specifically, I'm also seeing a bunch of Win32 function, so it's likely any external symbol that the obj is referencing. I'm not familiar with how external symbols are represented in obj files, or how cranelift places them there, so I'll have to dig into this further when I have time.

dpaoliello commented 4 days ago

Ok, I finally understand what's happening here.

When emitting a call, wasmtime emits it as LoadExtName then the call indirect: https://github.com/bytecodealliance/wasmtime/blob/e56ffd77f1fb2240e163b7f840f8c4e728c98434/cranelift/codegen/src/isa/aarch64/abi.rs#L1041-L1045

LoadExtName always get lowered and the page base + page offset reloc: https://github.com/bytecodealliance/wasmtime/blob/e56ffd77f1fb2240e163b7f840f8c4e728c98434/cranelift/codegen/src/isa/aarch64/inst/emit.rs#L3165-L3185

That type of reloc needs to be 8 byte aligned, however functions (at least on Windows, not sure about other platforms) aren't guaranteed to be 8 byte aligned, thus the linker complains.

When I look at code generated by LLVM, I see branch26 relocs being emitted for called functions, which seems to be generated by this: https://github.com/llvm/llvm-project/blob/18ee00323f5fc22d32a74b636fcac84e697241f3/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp#L459-L478

I'm not entirely sure how to handle implementing this in wasmtime with the abstraction between abi.rs and emit.rs.

dpaoliello commented 4 days ago

And, just to confound things, sometimes LLVM will generate a page base + page offset for a call: https://godbolt.org/z/3c7ce1vP4

alexcrichton commented 3 days ago

Is there a relocation that doesn't need to be 8-byte aligned we could use? I'm a bit surprised by that godbolt link because

example::call_yep::h3f70d43527bee11a:
 0:     str     x30, [sp, #-16]!
 4:     adrp    x8, __imp_yep
 8:     ldr     x8, [x8, :lo12:__imp_yep]
 c:     blr     x8
10:     adrp    x8, __imp_X
14:     ldr     x8, [x8, :lo12:__imp_X]
18:     ldr     w8, [x8]
1c:     madd    w0, w8, w8, w0
20:     ldr     x30, [sp], #16
24:     ret

Here adrp falls on both an 8-byte and non-8-byte aligned boundary (0x4/0x10). How does that work if it's required to be 8-byte aligned? Or is the assembler hiding a nop instruction or something like that?

If we need these relocations to be 8-byte aligned in Cranelift it would probably look like:

I'm mostly surprised that LLVM doesn't seem to be doing anything with nops but from what you're saying it should work?

dpaoliello commented 3 days ago

Sorry, bit of confusion, the target of the reloc needs to be 8 byte aligned, not the reloc or consuming instruction.

alexcrichton commented 3 days ago

Aha that makes more sense! (I should also read more carefully...)

This might be as simple as updating this value? That could perhaps have a comment for now saying only Windows so far requires 8-byte alignment but it's easier to bump all platforms to 8-byte so that's why it's unconditionally a minimum of 8 for now.

dpaoliello commented 3 days ago

That helps: it at least means that any function in the current compilation can be the target of a reloc. But I'm also seeing the linker complain about Win32 functions and parts of the CRT.

alexcrichton commented 3 days ago

Oh dear sorry I'm being particularly slow at understanding this, you've already told me that historically as well...

Is this perhaps something related to dllimport or something like that? Where memcpy should be imported via dllimport and some slightly different form of relocation is something the linker handles when fixing it up? Otherwise I'll probably step aside as I'm probably out of my depth here...