Open dotMorten opened 7 months ago
int8_t
and int32_t
are both signed. I'd expect static_cast<int32_t>
to be sign extend.
@huoyaoyuan the static-cast code was just to show what happens to a roundtripped value, but we have code deeper down that checks ranges of the sbyte (in the specific case -12..12) that fails for all negative numbers. The issue only occured on ios and catalyst when running on ARM64. Windows ARM64 and Android ARM64 are just fine
I wonder if the same thing happens with System.Int16 (short) <-> int16_t?
static_cast<int32_t>(value)
isn't a NOP on other platforms:
In other words, these platforms are treating upper bits of int8_t
as undefined.
Compiler explorer doesn't provide apple compilers, but I can imagine that they are defining the upper bits to be already sign extended.
@rolfbjarne we have not seen this happen with shorts.
@jakobbotsch Do we have the sign and zero extensions implemented correctly in RyuJIT for Apple Arm64 ABI?
@jakobbotsch Do we have the sign and zero extensions implemented correctly in RyuJIT for Apple Arm64 ABI?
Seems like we do. I tested SByteToInt
conversion with NativeAOT on iOS and it returns the expected result.
@dotMorten did you have a chance of trying out NativeAOT on iOS with your projects?
Regarding Mono, the problem seems reproducible only if the native library is built with -Os
:
clang++ -c -o lib.o lib.cpp -isysroot /Applications/Xcode_15.2.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.2.sdk -Os
Removing the size optimization flag when building the native library makes Mono produce the expected result as well.
We should look into why Mono fails when the library is built with -Os
I don't see where RyuJIT takes explicit care of this normalization. RyuJIT in general keeps all values in registers normalized up to 32 bits, which makes it very hard/impossible to hit issues around this with IL created from C#. But you can hit this with RyuJIT as well for IL based programs on both arm32 (which has the same ABI requirement) and apple arm64. I opened #101046 about it.
Looking at what apple clang produces with -S -emit-llvm
it looks like the initial LLVM IR doesn't need to include the sign extension explicitly.
This:
#include <stdint.h>
int32_t foo (int8_t i)
{
return i;
}
int32_t bar (int8_t i);
int32_t call_bar (int8_t i, int8_t j)
{
return bar(i + j);
}
Produces the following LLVM IR with lang -Os -S -emit-llvm /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk
:
ModuleID = '/tmp/foo.c'
source_filename = "/tmp/foo.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios17.5.0"
; Function Attrs: mustprogress nofree norecurse nosync nounwind optsize readnone ssp willreturn uwtable(sync)
define i32 @foo(i8 noundef signext %0) local_unnamed_addr #0 {
%2 = sext i8 %0 to i32
ret i32 %2
}
; Function Attrs: nounwind optsize ssp uwtable(sync)
define i32 @call_bar(i8 noundef signext %0, i8 noundef signext %1) local_unnamed_addr #1 {
%3 = add i8 %1, %0
%4 = tail call i32 @bar(i8 noundef signext %3) #3
ret i32 %4
}
; Function Attrs: optsize
declare i32 @bar(i8 noundef signext) local_unnamed_addr #2
attributes #0 = { mustprogress nofree norecurse nosync nounwind optsize readnone ssp willreturn uwtable(sync) "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #1 = { nounwind optsize ssp uwtable(sync) "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #2 = { optsize "frame-pointer"="non-leaf" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-a7" "target-features"="+aes,+crypto,+fp-armv8,+neon,+sha2,+v8a,+zcm,+zcz" }
attributes #3 = { nounwind optsize }
!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 17, i32 5]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 15.0.0 (clang-1500.3.9.4)"}
You get the same IR with apple clang targeting the standard ARM64 ABI: clang -Os -S /tmp/foo.c -o foo.S --target=arm64-linux-gnu
; ModuleID = '/tmp/foo.c'
source_filename = "/tmp/foo.c"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "arm64-unknown-linux-gnu"
; Function Attrs: mustprogress nofree norecurse nosync nounwind optsize readnone willreturn uwtable
define dso_local i32 @foo(i8 noundef %0) local_unnamed_addr #0 {
%2 = sext i8 %0 to i32
ret i32 %2
}
; Function Attrs: nounwind optsize uwtable
define dso_local i32 @call_bar(i8 noundef %0, i8 noundef %1) local_unnamed_addr #1 {
%3 = add i8 %1, %0
%4 = tail call i32 @bar(i8 noundef %3) #3
ret i32 %4
}
; Function Attrs: optsize
declare i32 @bar(i8 noundef) local_unnamed_addr #2
attributes #0 = { mustprogress nofree norecurse nosync nounwind optsize readnone willreturn uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #1 = { nounwind optsize uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #2 = { optsize "frame-pointer"="non-leaf" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" }
attributes #3 = { nounwind optsize }
!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 15.0.0 (clang-1500.3.9.4)"}
However the final assembly differs.
Apple API: clang -Os -S /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk --target=arm64-apple-ios17.5.0
.section __TEXT,__text,regular,pure_instructions
.build_version ios, 17, 5 sdk_version 17, 5
.globl _foo ; -- Begin function foo
.p2align 2
_foo: ; @foo
.cfi_startproc
; %bb.0:
ret
.cfi_endproc
; -- End function
.globl _call_bar ; -- Begin function call_bar
.p2align 2
_call_bar: ; @call_bar
.cfi_startproc
; %bb.0:
add w8, w1, w0
sxtb w0, w8
b _bar
.cfi_endproc
; -- End function
.subsections_via_symbols
Linux ABI: clang -Os -S /tmp/foo.c -o foo.S -isysroot /Applications/Xcode-15.4.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk --target=arm64-linux-gnu
.text
.file "foo.c"
.globl foo // -- Begin function foo
.p2align 2
.type foo,@function
foo: // @foo
.cfi_startproc
// %bb.0:
sxtb w0, w0
ret
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.cfi_endproc
// -- End function
.globl call_bar // -- Begin function call_bar
.p2align 2
.type call_bar,@function
call_bar: // @call_bar
.cfi_startproc
// %bb.0:
add w0, w1, w0
b bar
.Lfunc_end1:
.size call_bar, .Lfunc_end1-call_bar
.cfi_endproc
// -- End function
.ident "Apple clang version 15.0.0 (clang-1500.3.9.4)"
.section ".note.GNU-stack","",@progbits
.addrsig
Conclusion: we're probably not passing the correct target triple to something in our backend. either opt
or llc
or clang.
I think the problem is that we are not decorating params/arguments fewer than 32bits with the correct LLVM IR attributes.
If we considering a simple caller.cpp
#include <cstdint>
extern "C" {int32_t SByteToInt(int8_t value);}
int main() {
int result = SByteToInt(-5);
return result;
}
Compiled with: clang++ -S -emit-llvm -o caller.S caller.cpp -isysroot /Applications/Xcode_15.2.0.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.2.sdk
We get the following IR, where the param/arg have signext
attribute associated with them:
; Function Attrs: noinline norecurse optnone ssp uwtable(sync)
define i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, ptr %1, align 4
%3 = call i32 @SByteToInt(i8 noundef signext -5)
store i32 %3, ptr %2, align 4
%4 = load i32, ptr %2, align 4
ret i32 %4
}
declare i32 @SByteToInt(i8 noundef signext) #1
This gives the following assembly:
_main:
0000000000000000 sub sp, sp, #0x20
0000000000000004 stp x29, x30, [sp, #0x10]
0000000000000008 add x29, sp, #0x10
000000000000000c stur wzr, [x29, #-0x4]
0000000000000010 mov w0, #-0x5 // <-- correct
0000000000000014 bl 0x14
However, when we use Mono with LLVM that PInvokes the same function we generate:
; Function Attrs: noinline
define hidden monocc i32 @Program_Program_EONEConvert() #2 gc "coreclr" !dbg !89 {
...
gc.safepoint_poll.exit: ; preds = %BB0, %gc.safepoint_poll.poll.i
%2 = notail call monocc i32 @p_8_plt_Program_Program_SByteToInt_sbyte_llvm(i8 -5), !dbg !90, !managed_name !91
ret i32 %2, !dbg !92
}
declare hidden i32 @p_8_plt_Program_Program_SByteToInt_sbyte_llvm(i8) local_unnamed_addr #0
which gives following assembly:
_Program_Program_MyConvert:
0000000000000290 stp x29, x30, [sp, #-0x10]!
0000000000000294 adrp x8, 0 ; 0x0
0000000000000298 ldr x8, [x8]
000000000000029c ldr x8, [x8]
00000000000002a0 cbnz x8, 0x2b4
00000000000002a4 mov w0, #0xfb // <-- wrong
00000000000002a8 bl 0x2a8
Adapt https://github.com/dotnet/runtime/blob/main/src/mono/mono/mini/mini-llvm.c#L2333 (and other places) to properly decorate params/args with the correct set of https://llvm.org/docs/LangRef.html#parameter-attributes
FWIW Using mini-jit
backend alone, produces the correct result:
Program_EONEConvert:
00000001003e8280 stp x29, x30, [sp, #-0x10]!
00000001003e8284 mov x29, sp
00000001003e8288 adrp x16, 2243 ; 0x100cab000
00000001003e828c add x16, x16, #0xa50
00000001003e8290 ldr x0, [x16, #0x38]
00000001003e8294 ldr x17, [x0]
00000001003e8298 cbz x17, 0x1003e82a0
00000001003e829c bl plt
00000001003e82a0 mov x0, #-0x5
00000001003e82a4 bl plt_Program_SByteToInt_sbyte
Description
When sending an
sbyte
to a C++ library (int8_t
), on iOS and MacCatalyst ARM64, the sbyte is not sent across correctly.Reproduction Steps
MauiTestApp\MauiTestApp.csproj
and run it on an iOS Device or iOS ARM64 simulatorThat value comes from sending an sbyte=-5 to native code, let the native code convert it to an int32 and return it again. But because something goes wrong at the interop level, the value gets sent across as 251. This seems to only affect sbyte going into native code - the second part of the code tests signed bytes coming out of native code and handles negative values just fine.
A Windows version of the repro is also provided (I tested Windows ARM64 and Android ARM64 and these don't exhibit this behavior).
C++ code:
C# code:
Expected behavior
signed bytes gets sent across the interop correctly on iOS ARM64 like other platforms.
Actual behavior
-5
gets incorrectly transferred.Regression?
No response
Known Workarounds
don't use sbyte, but transfer as int32_t and cast once in native code.
Configuration
.NET 8 ARM64 iOS and Catalyst
Other information
Interesting details for Apple ARM64 ABI: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Pass-arguments-to-functions-correctly
I wonder if .NET zero-extends our sbyte instead of sign-extending it.
-5
in two's complimentint8_t
would be1111 1011
Zero-extended toint32_t
would be0000 0000 0000 0000 0000 0000 1111 1011
which is251