Some IR tests still fail on MacOS/AArch64

pfustc commented 9 months ago

Hi @dstogov,

I noticed that you had fixed the ir_add_veneer() assertion failure in https://github.com/dstogov/ir/commit/1671b3de78742b85611f0e84f805141608730dfc. After your fix, there is no more assertion failure, but below IR tests still fail on MacOS/AArch64.

001: Argument Passing [./tests/debug.aarch64/args_001.irt]
002: Argument Passing [./tests/debug.aarch64/args_002.irt]
Simple CALL -O0 [./tests/debug.aarch64/call-O0.irt]
Simple CALL [./tests/debug.aarch64/call.irt]
CALL with parallel argument passing [./tests/debug.aarch64/call2.irt]
Simple CALL with ALLOCA [./tests/debug.aarch64/call_alloca.irt]
Simple CALL with VADDR [./tests/debug.aarch64/call_vaddr.irt]
003: Parameter Loading and argument passing [./tests/debug.aarch64/params_003.irt]
Fib [./tests/debug.aarch64/regset-fib.irt]
Fib2 [./tests/debug.aarch64/regset-fib2.irt]
FibI [./tests/debug.aarch64/regset-fibi.irt]
Simple CALL [./tests/debug.aarch64/tailcall_001.irt]
FibI [./tests/fibi_min.irt]
CTLZ 001: [./tests/run/ctlz_001.irt]
CTPOP 001: [./tests/run/ctpop_001.irt]
CTTZ 001: [./tests/run/cttz_001.irt]
Floating Point number comparison (001: CMP edge cases) [./tests/run/fcmp_001.irt]
Floating Point number comparison (002: CMP+COND edge cases) [./tests/run/fcmp_002.irt]
VA_ARG 001: va_arg(int) [./tests/run/vaarg_001.irt]
VA_ARG 002: va_arg(double) [./tests/run/vaarg_002.irt]
VA_ARG 003: va_arg(float) [./tests/run/vaarg_003.irt]
VA_ARG 004: va_arg() expanded on AArch64 [./tests/run/vaarg_004_aarch64.irt]

The failures are caused by the difference between .exp and .out, for example ./tests/debug.aarch64/call.irt

call.out

main:
    stp x29, x30, [sp, #-0x10]!
    mov x29, sp
    adr x0, .L1
    movz w1, #0x2a
    movz x17, #0x9f54
    movk x17, #0x8d06, lsl #16
    movk x17, #0x1, lsl #32
    blr x17
    ldp x29, x30, [sp], #0x10
    ret
.rodata
.L1:
    .db 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x25, 0x64, 0x21, 0x0a, 0x00, 0x00

hello 1799882064!

exit code = 18

call.exp

main:
    stp x29, x30, [sp, #-0x10]!
    mov x29, sp
    adr x0, .L1
    movz w1, #0x2a
    bl printf
    ldp x29, x30, [sp], #0x10
    ret
.rodata
    .db 0x1f, 0x20, 0x03, 0xd5
.L1:
    .db 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x25, 0x64, 0x21, 0x0a, 0x00, 0x00

hello 42!

exit code = 10

It looks that the issues are still related to calling convention differences between Linux/AArch64 and MacOS/AArch64. Perhaps you are still working on some complete fixes for this. But I'd like to add this issue here to track this.

dstogov commented 9 months ago

Thanks for the report. The vararg passing on MacOS/AArch64 is not fixed yet.

However this failure is not related to calling convention and it is not a serious problem. The difference is in usage of blr instead of bl that is caused by the larger distance between JIT code buffer and printf.

Probably we may always generate bl and relay on additional veneer when necessary. I'll think about this.

dstogov commented 9 months ago

I'm wrong. This test uses expectation that is incorrect on MacOS. The third argument must be passed through stack and the result of the execution is incorrect. This is directly related to vararg passing.

dstogov commented 9 months ago

@pfustc could you please re-run the tests on MacOS/AArch64 once again I hope I fixed vararg passing via https://github.com/dstogov/ir/commit/c5bf8003c0f3aeaeef0700e17c44e137627758c8 Tests that verifies assembler code are going to be failed, but the output generated by JIT code must be the same (e.g. hello 42!)

pfustc commented 9 months ago

Thanks for your quick fix. We now have 12 failures, 11 of which are just caused by the diff in assembly code.

-------------------------------
FAILED TESTS
-------------------------------
001: Argument Passing [./tests/debug.aarch64/args_001.irt]
002: Argument Passing [./tests/debug.aarch64/args_002.irt]
Simple CALL -O0 [./tests/debug.aarch64/call-O0.irt]
Simple CALL [./tests/debug.aarch64/call.irt]
CALL with parallel argument passing [./tests/debug.aarch64/call2.irt]
Simple CALL with ALLOCA [./tests/debug.aarch64/call_alloca.irt]
Simple CALL with VADDR [./tests/debug.aarch64/call_vaddr.irt]
003: Parameter Loading and argument passing [./tests/debug.aarch64/params_003.irt]
Fib [./tests/debug.aarch64/regset-fib.irt]
Fib2 [./tests/debug.aarch64/regset-fib2.irt]
FibI [./tests/debug.aarch64/regset-fibi.irt]
Simple CALL [./tests/debug.aarch64/tailcall_001.irt]
-------------------------------

But the last one ./tests/debug.aarch64/tailcall_001.irt still has output difference.

$ diff tailcall_001.out tailcall_001.exp
2d1
<   sub sp, sp, #0x10
4,16c3,4
<   movz w16, #0x2a
<   str x16, [sp]
<   sub sp, sp, #0x10
<   adr x0, .L1
<   movz w16, #0x2a
<   str x16, [sp]
<   movz x17, #0x9f54
<   movk x17, #0x8857, lsl #16
<   movk x17, #0x1, lsl #32
<   blr x17
<   add sp, sp, #0x10
<   add sp, sp, #0x10
<   ret
---
>   movz w1, #0x2a
>   b printf
21a10,12
> hello 42!
>
> exit code = 10

There is no hello 42! printed in tailcall_001.out.

dstogov commented 9 months ago

Great! Thanks!

According ./tests/debug.aarch64/tailcall_001.irt - we probably can't use TAILCALL because prinrf() accepts stack parameters. Actually, the generated code tries to convert TAILCALL to regular CALL, but it seem done improperly.

the link register is not saved restored
42 passed on stack twice
add sp, sp, #0x10 is done twice

The other test failures most probably are false positives and may be avoided by separating target apple-aarch64. The other problem is bl/blr mismatch caused by different code layout.

dstogov commented 9 months ago

The output for ./tests/debug.aarch64/tailcall_001.irt should be fixed by https://github.com/dstogov/ir/commit/81fab9a2cbaaff0fdef7c4c93f5ccf4154278a43 I can't test this.

pfustc commented 9 months ago

Just checked that there's no other failures, except those caused by the assembly difference. Thanks for it.

The other test failures most probably are false positives and may be avoided by separating target apple-aarch64. The other problem is bl/blr mismatch caused by different code layout.

Is the assembly difference eventually caused by the different behavior of DynASM on Linux and MacOS? I'd like to know if it can be avoided by replacing current IR backend to something else.

dstogov commented 9 months ago

Is the assembly difference eventually caused by the different behavior of DynASM on Linux and MacOS? I'd like to know if it can be avoided by replacing current IR backend to something else.

There are two reasons:

the difference in calling conventions (e.g. vararg passing) and different code generated on purpose. We can't avoid this (because of similar reason we made difference between x86_64 and Windows-x86_64 targets. Probably, we will have to introduce apple-aarch64).
the difference in jump distance and usage of bl or blr instruction (this might be avoided by usage of reallocations or veneers).

pfustc commented 9 months ago

Oh yes, I understand that. Thanks for your explanation.

dstogov / ir

Some IR tests still fail on MacOS/AArch64 #55