LekKit / RVVM

The RISC-V Virtual Machine
GNU General Public License v3.0
930 stars 65 forks source link

JIT Illegal Instruction - ARM - big.LITTLE cache size difference - https://www.mono-project.com/news/2016/09/12/arm64-icache #141

Closed ZLangJIT closed 1 month ago

ZLangJIT commented 1 month ago

what could i do upon the JIT attempting to execute an Illegal Instruction (happens sometimes, never happens with -nojit)

as such seems to prevent the console from being restored (eg ^C functionality) upon Illegal Instruction being encountered, as well as prevent the rvvm from, say auto rebooting if such is encountered during an attempted kernel boot

examples:

[    4.030309] PM: genpd: Disabling unused power domains
[    4.068769] Freeing unused kernel image (initmem) memory: 4816K
[    4.071481] Run /init as init process
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Illegal instruction
localhost:~/riscv-kernel#
INFO: Attached MMIO device at 0x02000000, type "aclint_mswi"
INFO: Attached MMIO device at 0x02004000, type "aclint_mtimer"
INFO: Attached MMIO device at 0x0c000000, type "plic"
INFO: Attached MMIO device at 0x30000000, type "pci_bus"
INFO: Attached MMIO device at 0x10030000, type "i2c_opencores"
INFO: Attached MMIO device at 0x00101000, type "rtc_goldfish"
INFO: Attached MMIO device at 0x00100000, type "syscon"
INFO: Attached MMIO device at 0x10000000, type "ns16550a"
ERROR: No suitable windowing backends found!
INFO: Attached MMIO device at 0x40000000, type "rtl8169"
INFO: Generated DTB at 0x863fee10, size 4588
INFO: Dropping from root user to nobody
INFO: Hart 0x747f8db020 started
Illegal instruction
localhost:~/riscv-kernel#
~ # poweroff
~ # Stopping klogd: OK
Stopping syslogd: OK
Illegal instruction
localhost:~/riscv-kernel#

these are about 50/50, you either get them or you dont

LekKit commented 1 month ago

Is this the v0.6 release or v0.7-git staging build?

Please try the latest staging commit, build with make CFLAGS=-g, then do: $ gdb --args ./rvvm [rvvm args here]

Upon crash it will drop to GDB shell, please type bt, and x/10i $pc and post the results.

Guest firmware/kernel/image would be also helpful, and some info about the host

ZLangJIT commented 1 month ago

the host is Android 9 aarch64 termux proot-distro alpine

the build commit is d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d

and the branch is staging

i boot with

rm disk.img
dd if=/dev/zero of=disk.img bs=1M count=100

lldb ../RVVM/debug_BUILD/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline="console=ttyS0 rootflags=discard rw" \
-k disk.img

with uboot from https://github.com/LekKit/RVVM/releases/download/v0.6/fw_payload.bin

with kernel from https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.26/Image.gz

built via https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.github/workflows/build.yml

with configuration https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.linuxconfig

Mr0maks commented 1 month ago

the build commit is d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d

and the branch is staging

i boot with

rm disk.img
dd if=/dev/zero of=disk.img bs=1M count=100

lldb ../RVVM/debug_BUILD/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline="console=ttyS0 rootflags=discard rw" \
-k disk.img

with uboot from https://github.com/LekKit/RVVM/releases/download/v0.6/fw_payload.bin

with kernel from https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.26/Image.gz

built via https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.github/workflows/build.yml

with configuration https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.linuxconfig

Hello can't reproduce on RVVM 0.6v fw_payload.bin. Its runs perfectly without any problems. Please describe environment (HW specs) and compile & debug by gcc and gdb.

ZLangJIT commented 1 month ago

got a Illegal Instruction after 12 atttempted boots + # poweroff

(as i said it either happens or it doesnt)

poweroff
~ # Stopping klogd: OK
Stopping syslogd: OK
Process 5726 stopped
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
    frame #0: 0x0000007fb0ce0d1c
->  0x7fb0ce0d1c: b      0x7fb0d0d6bc
    0x7fb0ce0d20: ldr    x9, [x0, #0x28]
    0x7fb0ce0d24: add    x10, x9, #0x8
    0x7fb0ce0d28: lsr    x12, x10, #12
(lldb) bt
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode                                                          * frame #0: 0x0000007fb0ce0d1c
    frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16
    frame #2: 0x000000300006a9e4 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c0(vm=0x0000007fb1bfb020, insn=30920) at riscv_compressed.h:169:13
    frame #3: 0x000000300006a0d4 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=30920) at riscv_compressed.h:594:13
    frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=30920) at riscv_interpreter.h:83:5
    frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9
    frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9
    frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9
    frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5
    frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1
    frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1
(lldb) x/10i $pc
->  0x7fb0ce0d1c: 0x1400b268   unknown     b      0x7fb0d0d6bc
    0x7fb0ce0d20: 0xf9401409   unknown     ldr    x9, [x0, #0x28]
    0x7fb0ce0d24: 0x9100212a   unknown     add    x10, x9, #0x8
    0x7fb0ce0d28: 0xd34cfd4c   unknown     lsr    x12, x10, #12
    0x7fb0ce0d2c: 0x92401d8d   unknown     and    x13, x12, #0xff
    0x7fb0ce0d30: 0x531b69ad   unknown     lsl    w13, w13, #5
    0x7fb0ce0d34: 0x8b0001ad   unknown     add    x13, x13, x0
    0x7fb0ce0d38: 0xf9410daf   unknown     ldr    x15, [x13, #0x218]                                                               0x7fb0ce0d3c: 0xca0c01ef   unknown     eor    x15, x15, x12                                                                    0x7fb0ce0d40: 0x9240054c   unknown     and    x12, x10, #0x3
(lldb)
ZLangJIT commented 1 month ago

after a few more boots (boot to shell, CTRL + C , (lldb) r , y (kill current process and restart ) i get this

Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info       : 0 (0x00000000)
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109
Process 7707 stopped
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
    frame #0: 0x0000007fb0de1280
->  0x7fb0de1280: b      0x7fb0df63b0
    0x7fb0de1284: add    x5, x10, #0x8
    0x7fb0de1288: lsr    x6, x5, #12
    0x7fb0de128c: and    x7, x6, #0xff
(lldb) bt
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
  * frame #0: 0x0000007fb0de1280
    frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16
    frame #2: 0x000000300006c58c rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c1(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:402:17
    frame #3: 0x000000300006bc48 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:598:13
    frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=29021) at riscv_interpreter.h:83:5
    frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9
    frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9
    frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9
    frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5
    frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1
    frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1
(lldb) x/10i $pc
->  0x7fb0de1280: 0x1400544c   unknown     b      0x7fb0df63b0
    0x7fb0de1284: 0x91002145   unknown     add    x5, x10, #0x8
    0x7fb0de1288: 0xd34cfca6   unknown     lsr    x6, x5, #12
    0x7fb0de128c: 0x92401cc7   unknown     and    x7, x6, #0xff
    0x7fb0de1290: 0x531b68e7   unknown     lsl    w7, w7, #5
    0x7fb0de1294: 0x8b0000e7   unknown     add    x7, x7, x0
    0x7fb0de1298: 0xf9410ce8   unknown     ldr    x8, [x7, #0x218]
    0x7fb0de129c: 0xca060108   unknown     eor    x8, x8, x6
    0x7fb0de12a0: 0x924008a6   unknown     and    x6, x5, #0x7
    0x7fb0de12a4: 0xaa0800c6   unknown     orr    x6, x6, x8
(lldb)
Mr0maks commented 1 month ago

after a few more boots (boot to shell, CTRL + C , (lldb) r , y (kill current process and restart ) i get this

Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info       : 0 (0x00000000)
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109
Process 7707 stopped
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
    frame #0: 0x0000007fb0de1280
->  0x7fb0de1280: b      0x7fb0df63b0
    0x7fb0de1284: add    x5, x10, #0x8
    0x7fb0de1288: lsr    x6, x5, #12
    0x7fb0de128c: and    x7, x6, #0xff
(lldb) bt
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
  * frame #0: 0x0000007fb0de1280
    frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16
    frame #2: 0x000000300006c58c rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c1(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:402:17
    frame #3: 0x000000300006bc48 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:598:13
    frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=29021) at riscv_interpreter.h:83:5
    frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9
    frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9
    frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9
    frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5
    frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1
    frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1
(lldb) x/10i $pc
->  0x7fb0de1280: 0x1400544c   unknown     b      0x7fb0df63b0
    0x7fb0de1284: 0x91002145   unknown     add    x5, x10, #0x8
    0x7fb0de1288: 0xd34cfca6   unknown     lsr    x6, x5, #12
    0x7fb0de128c: 0x92401cc7   unknown     and    x7, x6, #0xff
    0x7fb0de1290: 0x531b68e7   unknown     lsl    w7, w7, #5
    0x7fb0de1294: 0x8b0000e7   unknown     add    x7, x7, x0
    0x7fb0de1298: 0xf9410ce8   unknown     ldr    x8, [x7, #0x218]
    0x7fb0de129c: 0xca060108   unknown     eor    x8, x8, x6
    0x7fb0de12a0: 0x924008a6   unknown     and    x6, x5, #0x7
    0x7fb0de12a4: 0xaa0800c6   unknown     orr    x6, x6, x8
(lldb)

Also would be good to get output of x/20i $pc-40 and for address to what it trying to jump.

ZLangJIT commented 1 month ago
(lldb) x/20i $pc-40
    0x7fb0de1258: 0xf9403c09   unknown     ldr    x9, [x0, #0x78]
    0x7fb0de125c: 0xeb1f013f   unknown     cmp    x9, xzr
    0x7fb0de1260: 0x54000121   unknown     b.ne   0x7fb0de1284
    0x7fb0de1264: 0xf900280a   unknown     str    x10, [x0, #0x50]
    0x7fb0de1268: 0xf900380b   unknown     str    x11, [x0, #0x70]
    0x7fb0de126c: 0xf900540c   unknown     str    x12, [x0, #0xa8]
    0x7fb0de1270: 0xf900600e   unknown     str    x14, [x0, #0xc0]
    0x7fb0de1274: 0xf940840f   unknown     ldr    x15, [x0, #0x108]
    0x7fb0de1278: 0x9100a9ef   unknown     add    x15, x15, #0x2a
    0x7fb0de127c: 0xf900840f   unknown     str    x15, [x0, #0x108]
->  0x7fb0de1280: 0x1400544c   unknown     b      0x7fb0df63b0
    0x7fb0de1284: 0x91002145   unknown     add    x5, x10, #0x8
    0x7fb0de1288: 0xd34cfca6   unknown     lsr    x6, x5, #12
    0x7fb0de128c: 0x92401cc7   unknown     and    x7, x6, #0xff
    0x7fb0de1290: 0x531b68e7   unknown     lsl    w7, w7, #5
    0x7fb0de1294: 0x8b0000e7   unknown     add    x7, x7, x0
    0x7fb0de1298: 0xf9410ce8   unknown     ldr    x8, [x7, #0x218]
    0x7fb0de129c: 0xca060108   unknown     eor    x8, x8, x6
    0x7fb0de12a0: 0x924008a6   unknown     and    x6, x5, #0x7
    0x7fb0de12a4: 0xaa0800c6   unknown     orr    x6, x6, x8
(lldb) x/20i 0x7fb0df63b0
    0x7fb0df63b0: 0xf940280b   unknown     ldr    x11, [x0, #0x50]
    0x7fb0df63b4: 0xaa0b03ec   unknown     mov    x12, x11
    0x7fb0df63b8: 0xd34cfd8d   unknown     lsr    x13, x12, #12
    0x7fb0df63bc: 0x92401dae   unknown     and    x14, x13, #0xff
    0x7fb0df63c0: 0x531b69ce   unknown     lsl    w14, w14, #5
    0x7fb0df63c4: 0x8b0001ce   unknown     add    x14, x14, x0
    0x7fb0df63c8: 0xf9410dcf   unknown     ldr    x15, [x14, #0x218]
    0x7fb0df63cc: 0xca0d01ef   unknown     eor    x15, x15, x13
    0x7fb0df63d0: 0x9240098d   unknown     and    x13, x12, #0x7
    0x7fb0df63d4: 0xaa0f01ad   unknown     orr    x13, x13, x15
    0x7fb0df63d8: 0xb400004d   unknown     cbz    x13, 0x7fb0df63e0
    0x7fb0df63dc: 0xd65f03c0   unknown     ret
    0x7fb0df63e0: 0xf94109cf   unknown     ldr    x15, [x14, #0x210]
    0x7fb0df63e4: 0x8b0c01ef   unknown     add    x15, x15, x12
    0x7fb0df63e8: 0xf94001ee   unknown     ldr    x14, [x15]
    0x7fb0df63ec: 0xf940540f   unknown     ldr    x15, [x0, #0xa8]
    0x7fb0df63f0: 0xaa0f03ed   unknown     mov    x13, x15
    0x7fb0df63f4: 0xaa0b03ec   unknown     mov    x12, x11
    0x7fb0df63f8: 0xaa0e03ea   unknown     mov    x10, x14
    0x7fb0df63fc: 0xf9408409   unknown     ldr    x9, [x0, #0x108]
(lldb) x/20i 0x7fb0df63b0-40
    0x7fb0df6388: 0x121c1dce   unknown     and    w14, w14, #0xff0
    0x7fb0df638c: 0x8b0001ce   unknown     add    x14, x14, x0
    0x7fb0df6390: 0xf9510dcd   unknown     ldr    x13, [x14, #0x2218]
    0x7fb0df6394: 0xeb0f01bf   unknown     cmp    x13, x15
    0x7fb0df6398: 0x540000a1   unknown     b.ne   0x7fb0df63ac
    0x7fb0df639c: 0xb940000d   unknown     ldr    w13, [x0]
    0x7fb0df63a0: 0x3400006d   unknown     cbz    w13, 0x7fb0df63ac
    0x7fb0df63a4: 0xf95109cf   unknown     ldr    x15, [x14, #0x2210]
    0x7fb0df63a8: 0xd61f01e0   unknown     br     x15
    0x7fb0df63ac: 0xd65f03c0   unknown     ret
    0x7fb0df63b0: 0xf940280b   unknown     ldr    x11, [x0, #0x50]
    0x7fb0df63b4: 0xaa0b03ec   unknown     mov    x12, x11
    0x7fb0df63b8: 0xd34cfd8d   unknown     lsr    x13, x12, #12
    0x7fb0df63bc: 0x92401dae   unknown     and    x14, x13, #0xff
    0x7fb0df63c0: 0x531b69ce   unknown     lsl    w14, w14, #5
    0x7fb0df63c4: 0x8b0001ce   unknown     add    x14, x14, x0                                                                     0x7fb0df63c8: 0xf9410dcf   unknown     ldr    x15, [x14, #0x218]
    0x7fb0df63cc: 0xca0d01ef   unknown     eor    x15, x15, x13
    0x7fb0df63d0: 0x9240098d   unknown     and    x13, x12, #0x7
    0x7fb0df63d4: 0xaa0f01ad   unknown     orr    x13, x13, x15
(lldb)
LekKit commented 1 month ago

It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.

Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64

https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72

ZLangJIT commented 1 month ago

It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.

Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64

https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72

hmm alright

ZLangJIT commented 1 month ago

it appears to work so far, i have done many boot-up's and power-off's without problems :)

ZLangJIT commented 1 month ago

if it helps at all

~ $ sudo cat /proc/cmdline
console=ram loglevel=4 sec_debug.level=0 sec_watchdog.sec_pet=5 androidboot.debug_level=0x4f4c androidboot.dram_info=01,12,00,4G androidboot.ap_serial=0x010C4FC44ADA sec_debug.chipidfail_cnt=0 sec_debug.lpitimeout_cnt=0 sec_debug.cache_err_cnt=0 sec_debug.codediff_cnt=1 sec_debug.pcb_offset=7343872 sec_debug.smd_offset=7344896 sec_debug.lpddr4_size=4 sec_debug.sjl=1 androidboot.prototype.param.offset=7345920 ess_setup=0x91200000 tima_log=0x200000@0xb1000000 sec_avc_log=0x40000@0x92202000 sec_tsp_log=0x40000@0x92244000 sec_debug.base=0x100000@0x92286000 auto_summary_log=0x10000@0x92388000 charging_mode=0x3030 s3cfb.bootloaderfb=0xcc000000 lcdtype=13713429 androidboot.carrierid.param.offset=7340608 androidboot.carrierid=XSA consoleblank=0 vmalloc=384m sec_debug.reset_reason=7 sec_reset.reset_reason=7 ehci_hcd.park=3 oops=panic pmic_info=43 ccic_info=1 fg_reset=0 androidboot.emmc_checksum=3 androidboot.sales.param.offset=7340572 sales_code=XSA androidboot.odin_download=1 androidboot.bootloader=G950FXXSBDTJ1 androidboot.selinux=enforcing androidboot.security_mode=1526595585 androidboot.ucs_mode=0 kaslr_region=0x1000@0x80001000 androidboot.revision=10 androidboot.hardware=samsungexynos8895 androidboot.warranty_bit=1 androidboot.wb.hs=0000 sec_debug.bin=A androidboot.hmac_mismatch=0 androidboot.sec_atd.tty=/dev/ttySAC0 androidboot.serialno=ce091829e258a11b04 snd_soc_core.pmdown_time=1000 androidboot.cp_reserved_mem=off nohugeiomap androidboot.fmp_config=0 androidboot.em.did=010c4fc44ada androidboot.em.model=SM-G950F androidboot.em.status=0x0 androidboot.verifiedbootstate=orange bcm_setup=0xffffff80f8e00000 reserve-fimc=0xffffff80fa000000 firmware_class.path=/vendor/firmware region1=EUR region2=OPEN
~ $ sudo cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 2
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 3
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 4
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0x001
CPU revision    : 0

processor       : 5
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0x001
CPU revision    : 0

processor       : 6
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0x001
CPU revision    : 0

processor       : 7
BogoMIPS        : 52.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0x001
CPU revision    : 0

~ $ uname -a
Linux localhost 4.4.111-ge843fcf8661e #1 SMP PREEMPT Wed Sep 25 19:01:25 UTC 2019 aarch64 Android
~ $

Samsung Galaxy S8 Lineage OS Pie

ZLangJIT commented 1 month ago

it appears to work so far, i have done many boot-up's and power-off's without problems :)

tho im not sure if this would be device specific or not (ei disabling it may have adverse effects for other arm64 platforms/devices and i do not have the capability to test on an array of arm64 hardware)

ZLangJIT commented 1 month ago

if it helps at all

...

specifically a Samsung Exynos 8895 cpu

Mr0maks commented 1 month ago

It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.

Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64

https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72

Maybe its a big.LITTLE trouble, mono has a some workaround for that:

https://www.mono-project.com/news/2016/09/12/arm64-icache/

LekKit commented 1 month ago

specifically a Samsung Exynos 8895 cpu

Indeed that helps a lot. This CPU has different cache line sizes on different cores, and yet the android libc is not prepared to handle that. An errata (cpu-specific bugfix for icache flushing) needs to be implemented on RVJIT side.

Disabling RVJIT_NATIVE_LINKER has a side effect that icache flushes are not done in a way to trigger this, but in general results in less JIT optimizations.

ZLangJIT commented 1 month ago

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal build works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

LekKit commented 1 month ago

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

Uh is the -I compiler argument not working?

ZLangJIT commented 1 month ago

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

Uh is the -I compiler argument not working?

im using cmake

it doesnt seem to handle h files in source blocks in ndk cmake/llvm

ZLangJIT commented 1 month ago

specifically a Samsung Exynos 8895 cpu

Indeed that helps a lot. This CPU has different cache line sizes on different cores, and yet the android libc is not prepared to handle that. An errata (cpu-specific bugfix for icache flushing) needs to be implemented on RVJIT side.

Disabling RVJIT_NATIVE_LINKER has a side effect that icache flushes are not done in a way to trigger this, but in general results in less JIT optimizations.

not quite sure how to go about implementing a fix for this, especially with

It can happen that a process gets scheduled on a different CPU while executing the __clear_cache function with a certain cache line size, where it might not be valid anymore

n30f0x commented 1 month ago

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing

ZLangJIT commented 1 month ago

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing

im using alpine aarch64 proot with (proot-distro)

localhost:~/riscv-kernel/libmedia# cmake --version
cmake version 3.30.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).
localhost:~/riscv-kernel/libmedia# clang --version
Alpine clang version 18.1.8
Target: aarch64-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang18/aarch64-alpine-linux-musl.cfg
localhost:~/riscv-kernel/libmedia#

in github archlinux CI im using sdkmanager cmake and ndk 21 which has llvm 14 i think

specifically CMake 3.10.2.4988404 (revision: 3.10.2) and ndk 21.4.7075529

LekKit commented 1 month ago

Well RVVM should be able to work in Termux as well as natively on Android. Makefile supports building with NDK (just need to pass make CC=/path/to/ndk/clang). I am not sure what implications proot has, but it's effectively more layers of abstractions that are ultimately not needed specifically for RVVM.

Also see https://github.com/fish4terrisa-MSDSM/archriscv-term

LekKit commented 1 month ago

It is desirable to have a dedicated Android app with working graphics in the future; librvvm JNI bindings can be used from Java to create/manipulate/access virtual machines already, but full Android app is not implemented yet

archriscv-term currently can be used as a third party Android app around RVVM, which also not only runs Arch but any other distro/OS that RVVM already supports

ZLangJIT commented 1 month ago

It is desirable to have a dedicated Android app with working graphics in the future; librvvm JNI bindings can be used from Java to create/manipulate/access virtual machines already, but full Android app is not implemented yet

yea, so far i have a simple rvvm terminal application

https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.30/linux.kernel.rvvm.debug.apk (aarch64)

(in Log/Terminal - swipe from the left edge to bring up the drawer, like in termux, then cd ASSETS ; ./boot_rvvm_disk.sh)

btw android's dd does hot accept -h nor -help nor --help and its bs= doesnt seem to accept the M suffix

one challenge would however be accelerated 3d graphics since android disallows KVM

LekKit commented 1 month ago

3D graphics also need work on VM side, such as virtio-gpu implementation. However software rendering is already working well everywhere, just need a way to render a raw framebuffer on android.

LekKit commented 1 month ago

As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files

It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.

To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...

ZLangJIT commented 1 month ago

3D graphics also need work on VM side, such as virtio-gpu implementation. However software rendering is already working well everywhere, just need a way to render a raw framebuffer on android.

yup

ZLangJIT commented 1 month ago

As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files

It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.

To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...

true, tho we could attempt to identify if we are running on a ARM64 android device and then use 64 kb cache always, without disturbing other devices such as Apple M1 or Arm64 laptop/surface tablets

ZLangJIT commented 1 month ago

hmm, im not sure if the following would work on android (not sure how to verify a task/pid is not scheduled on a different core)

https://android.googlesource.com/platform/art/+/main/runtime/jit/jit_code_cache.cc

https://stackoverflow.com/questions/7467848/is-it-possible-to-set-affinity-with-sched-setaffinity-in-android

https://stackoverflow.com/questions/76322956/ocassional-invalid-argument-with-sched-setaffinity-on-the-android-device

https://android.googlesource.com/platform/external/toybox/+/7a3f53b/toys/other/taskset.c

u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $ taskset --help
usage: taskset [-ap] [mask] [PID | cmd [args...]]

Launch a new task which may only run on certain processors, or change
the processor affinity of an exisitng PID.

Mask is a hex string where each bit represents a processor the process
is allowed to run on. PID without a mask displays existing affinity.

-p      Set/get the affinity of given PID instead of a new command
-a      Set/get the affinity of all threads of the PID

u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $ taskset -p $$
pid 30855's current affinity mask: ff
u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $
ZLangJIT commented 1 month ago

As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files

It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.

To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...

for now i think it would be best to hardcode 64 when compiling for big.LITTLE devices

tho i am not sure where to do this in RVVM sources

LekKit commented 1 month ago

for now i think it would be best to hardcode 64 when compiling for big.LITTLE devicesho

Well unfortunately it knows nothing about that, at least not in a portable way if we assume it ever runs on an Android device with a very old kernel.

LekKit commented 1 month ago

Please re-test with latest staging (e09e7dc) - this should fix your issue, and didn't bring up any regressions on ARM64 hardware tested so far (Mac M1 and Ampere). Reopen if this is still an issue.

ZLangJIT commented 1 month ago

Please re-test with latest staging (e09e7dc) - this should fix your issue, and didn't bring up any regressions on ARM64 hardware tested so far (Mac M1 and Ampere). Reopen if this is still an issue.

alright

ZLangJIT commented 1 month ago

unfortunately i still get SIGILL

localhost:~/riscv-kernel# ./boot_rvvm_disk.sh
removed 'disk.img'
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 0.115777 seconds, 863.7MB/s
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./rvvm...
(gdb)
Starting program: /root/riscv-kernel/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline=console=ttyS0\ rootflags=discard\ rw\  -k disk.img
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
INFO: Attached MMIO device at 0x02000000, type "aclint_mswi"
INFO: Attached MMIO device at 0x02004000, type "aclint_mtimer"
INFO: Attached MMIO device at 0x0c000000, type "plic"
INFO: Attached MMIO device at 0x30000000, type "pci_bus"
INFO: Attached MMIO device at 0x10030000, type "i2c_opencores"
INFO: Attached MMIO device at 0x00101000, type "rtc_goldfish"
INFO: Attached MMIO device at 0x00100000, type "syscon"
INFO: Attached MMIO device at 0x10000000, type "ns16550a"
ERROR: No suitable windowing backends found!
[New LWP 6369]
INFO: Attached MMIO device at 0x40000000, type "rtl8169"
INFO: Generated DTB at 0x863fee10, size 4592
[New LWP 6370]
INFO: Dropping from root user to nobody
[New LWP 6371]
INFO: Hart 0x7fb7832000 started
[LWP 6371 exited]

OpenSBI v1.4
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : RVVM v0.7-b3e9533b-dirty
Platform Features         : medeleg
Platform HART Count       : 1
Platform IPI Device       : aclint-mswi
Platform Timer Device     : aclint-mtimer @ 10000000Hz
Platform Console Device   : uart8250
Platform HSM Device       : ---
Platform PMU Device       : ---
Platform Reboot Device    : syscon-reboot
Platform Shutdown Device  : syscon-poweroff
Platform Suspend Device   : ---
Platform CPPC Device      : ---
Firmware Base             : 0x80000000
Firmware Size             : 191 KB
Firmware RW Offset        : 0x20000
Firmware RW Size          : 63 KB
Firmware Heap Offset      : 0x27000
Firmware Heap Size        : 35 KB (total), 2 KB (reserved), 9 KB (used), 23 KB (free)
Firmware Scratch Size     : 4096 B (total), 328 B (used), 3768 B (free)
Runtime SBI Version       : 2.0

Domain0 Name              : root
Domain0 Boot HART         : 0
Domain0 HARTs             : 0*
Domain0 Region00          : 0x0000000000100000-0x0000000000100fff M: (I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000010000000-0x0000000010000fff M: (I,R,W) S/U: (R,W)
Domain0 Region02          : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: ()
Domain0 Region03          : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
Domain0 Region04          : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region05          : 0x000000000c000000-0x000000000fffffff M: (I,R,W) S/U: (R,W)
Domain0 Region06          : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000080100000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
Domain0 SysSuspend        : yes

Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdcb
Boot HART ISA Extensions  : sstc,zicntr,zkr,zicboz,zicbom
Boot HART PMP Count       : 0
Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info       : 0 (0x00000000)
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109

Thread 3 "rvvm" received signal SIGILL, Illegal instruction.
[Switching to LWP 6370]
0x0000007fb00e1688 in ?? ()

(gdb)
#0  0x0000007fb00e1688 in ?? ()
#1  0x0000007fb783524c in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
(gdb)
=> 0x7fb00e1688:        b       0x7fb00e3cb0
   0x7fb00e168c:        ldr     x10, [x0, #80]
   0x7fb00e1690:        mov     x11, x10
   0x7fb00e1694:        lsr     x12, x11, #12
   0x7fb00e1698:        and     x13, x12, #0xff
   0x7fb00e169c:        lsl     w13, w13, #5
   0x7fb00e16a0:        add     x13, x13, x0
   0x7fb00e16a4:        ldr     x14, [x13, #536]
   0x7fb00e16a8:        eor     x14, x14, x12
   0x7fb00e16ac:        and     x12, x11, #0x7
   0x7fb00e16b0:        orr     x12, x12, x14
   0x7fb00e16b4:        cbz     x12, 0x7fb00e16c8
   0x7fb00e16b8:        ldr     x15, [x0, #264]
   0x7fb00e16bc:        add     x15, x15, #0x2
   0x7fb00e16c0:        str     x15, [x0, #264]
   0x7fb00e16c4:        ret
   0x7fb00e16c8:        ldr     x14, [x13, #528]
   0x7fb00e16cc:        add     x14, x14, x11
   0x7fb00e16d0:        ldr     x15, [x14]
   0x7fb00e16d4:        ldr     x14, [x0, #168]
(gdb)

could we add a cmake/rvvm option to disable the JIT native linker globally for builds that will be executed on devices that end up exibiting such SIGILL flushes ?

ZLangJIT commented 1 month ago

Also migrating from cmake 1.6 / 1.10 to cmake 1.18 fixed the following

btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)

could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing

im using alpine aarch64 proot with (proot-distro)

localhost:~/riscv-kernel/libmedia# cmake --version
cmake version 3.30.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).
localhost:~/riscv-kernel/libmedia# clang --version
Alpine clang version 18.1.8
Target: aarch64-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang18/aarch64-alpine-linux-musl.cfg
localhost:~/riscv-kernel/libmedia#

in github archlinux CI im using sdkmanager cmake and ndk 21 which has llvm 14 i think

specifically CMake 3.10.2.4988404 (revision: 3.10.2) and ndk 21.4.7075529

LekKit commented 1 month ago

RVVM v0.7-b3e9533b-dirty

What commit is that, why is it dirty?

You can also try lowering both dsize and isize variables to value 32 here and see if it helps: https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52

ZLangJIT commented 1 month ago

RVVM v0.7-b3e9533b-dirty

What commit is that, why is it dirty?

You can also try lowering both dsize and isize variables to value 32 here and see if it helps:

https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52

was avoiding storing the .git i now upload it with .git as .git0 and move it back to .git in the CI

ZLangJIT commented 1 month ago

RVVM v0.7-b3e9533b-dirty

What commit is that, why is it dirty?

You can also try lowering both dsize and isize variables to value 32 here and see if it helps:

https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52

using 32 seems to work

could we also add an option to specify the iflush size on such SIGILL flushes (with an additional option to disable the iflush cache / native linker as a fallback if the cache size is unknown to the user or they just dont wanna deal with trying to figure out the cache size for various devices/hardware configurations)

btw does data size have to equal instruction size or can they differ ?

if so could we add options to set each?

eg

rvvm --dsize 32 --isize 32   # custom cache flush size, ARM only
rvvm --disable-native-linker  # disable native linker, ARM only
ZLangJIT commented 1 month ago

RVVM v0.7-b3e9533b-dirty

What commit is that, why is it dirty? You can also try lowering both dsize and isize variables to value 32 here and see if it helps: https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52

using 32 seems to work

could we also add an option to specify the iflush size on such SIGILL flushes (with an additional option to disable the iflush cache / native linker as a fallback if the cache size is unknown to the user or they just dont wanna deal with trying to figure out the cache size for various devices/hardware configurations)

btw does data size have to equal instruction size or can they differ ?

if so could we add options to set each?

eg

rvvm --dsize 32 --isize 32   # custom cache flush size, ARM only
rvvm --disable-native-linker  # disable native linker, ARM only

spoke too soon, 32 also fails with SIGILL

ZLangJIT commented 1 month ago

RVVM v0.7-b3e9533b-dirty

What commit is that, why is it dirty?

You can also try lowering both dsize and isize variables to value 32 here and see if it helps:

https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52

hmm it appears if we get added as a cmake subproject we appear to derive the head as-if from the parent repo ... (as we should otherwise get a commit of f4031a4)

ZLangJIT commented 1 month ago

ok i created a patch to disable the native linker via -rvjit_disable_native_linker https://github.com/ZLangJIT/riscv-kernel/blob/cfc74e807c1ec448b25a6d5e7790cf8c6befcdc7/rvvm.patch

LekKit commented 1 month ago

spoke too soon, 32 also fails with SIGILL

I am not sure how to fix this at all then. I have an idea that lazy linker patch flushes might be causing this, but it would mean that patchpoints somehow end up spilled between cachelines which didn't ever happen on any ARM64 CPU yet.

It is possible that this patch will help, but I am not sure. Revert to a clean staging tree beforehand.

From b6381de3b34ca65e1a774307cd23c6da1e1697b4 Mon Sep 17 00:00:00 2001
From: LekKit <50500857+LekKit@users.noreply.github.com>
Date: Mon, 30 Sep 2024 21:07:48 +0300
Subject: [PATCH] rvjit: Flush icache on linker patchpoints

---
 src/rvjit/rvjit.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/rvjit/rvjit.c b/src/rvjit/rvjit.c
index 8ad74f8..e90983b 100644
--- a/src/rvjit/rvjit.c
+++ b/src/rvjit/rvjit.c
@@ -41,6 +41,8 @@ void sys_icache_invalidate(void* start, size_t len);
 #include <sys/syscall.h>
 #include <unistd.h>

+#define RVJIT_GLOBAL_ICACHE_FLUSH
+
 #elif defined(RVJIT_ARM64) && defined(GNU_EXTS)
 /*
  * Don't rely on GCC's __clear_cache implementation, as it may
@@ -263,6 +265,9 @@ rvjit_func_t rvjit_block_finalize(rvjit_block_t* block)
         vector_foreach(*linked_blocks, i) {
             uint8_t* jptr = vector_at(*linked_blocks, i);
             rvjit_linker_patch_jmp(jptr, ((size_t)dest) - ((size_t)jptr));
+#ifndef RVJIT_GLOBAL_ICACHE_FLUSH
+            rvjit_flush_icache(jptr, 8);
+#endif
         }
         vector_free(*linked_blocks);
         free(linked_blocks);
-- 
2.46.2
ZLangJIT commented 1 month ago

flushing icache on linker patchpoints works :)

tho i would like to keep the -rvjit_disable_native_linker as a fallback in case this ever fails again

or at least make it a optional config ( RVVM_NO_NATIVE_LINKER ) should the rvvm_has_args become a perf hit

LekKit commented 1 month ago

or at least make it a optional config ( RVVM_NO_NATIVE_LINKER ) should the rvvm_has_args become a perf hit

There will be a special useflag in the refactored Makefile that is currently being worked on, but really it's more of a workaround than anything else. Seems that the patchpoint flushing is the way to go forward.

LekKit commented 1 month ago

Actually I have another patch that tries to fix it differently, which I'm interested to test too... (revert the previous patch or clean the source tree)

From e0745f4f9223f3afc91284e084537d6e572f4c68 Mon Sep 17 00:00:00 2001
From: LekKit <50500857+LekKit@users.noreply.github.com>
Date: Tue, 1 Oct 2024 00:17:12 +0300
Subject: [PATCH] rvjit_arm64: Omit zeroing patched jump

---
 src/rvjit/rvjit_arm64.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/rvjit/rvjit_arm64.h b/src/rvjit/rvjit_arm64.h
index ed01f45..ea2b6bb 100644
--- a/src/rvjit/rvjit_arm64.h
+++ b/src/rvjit/rvjit_arm64.h
@@ -1343,7 +1343,6 @@ static inline void rvjit_patch_ret(void* addr)
 static inline bool rvjit_patch_jmp(void* addr, int32_t offset)
 {
     if (rvjit_a64_valid_reloc(offset)) {
-        write_uint32_le_m(addr, 0);
         rvjit_a64_b_reloc(addr, offset);
         return true;
     }
-- 
2.46.2
LekKit commented 1 month ago

I'm just trying to get an idea what that specific Exynos CPU model does differently from other CPUs we tested on

ZLangJIT commented 1 month ago

Actually I have another patch that tries to fix it differently, which I'm interested to test too... (revert the previous patch or clean the source tree)

From e0745f4f9223f3afc91284e084537d6e572f4c68 Mon Sep 17 00:00:00 2001
From: LekKit <50500857+LekKit@users.noreply.github.com>
Date: Tue, 1 Oct 2024 00:17:12 +0300
Subject: [PATCH] rvjit_arm64: Omit zeroing patched jump

---
 src/rvjit/rvjit_arm64.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/rvjit/rvjit_arm64.h b/src/rvjit/rvjit_arm64.h
index ed01f45..ea2b6bb 100644
--- a/src/rvjit/rvjit_arm64.h
+++ b/src/rvjit/rvjit_arm64.h
@@ -1343,7 +1343,6 @@ static inline void rvjit_patch_ret(void* addr)
 static inline bool rvjit_patch_jmp(void* addr, int32_t offset)
 {
     if (rvjit_a64_valid_reloc(offset)) {
-        write_uint32_le_m(addr, 0);
         rvjit_a64_b_reloc(addr, offset);
         return true;
     }
-- 
2.46.2

the above patch didnt seem to work, we still get SIGILL

LekKit commented 1 month ago

I see. Well anyways thanks for testing

ZLangJIT commented 1 month ago

on a side note, could we install a SIGSEGV/SIGBUS/SIGILL/etc/SIGTERM (and whatever win32 equiv is) handler to revert the terminal back to its original state before exiting such that a rvvm crash/^C ( win32 ^Z ) does not leave the terminal in a non-default state ?

LekKit commented 1 month ago

on a side note, could we install a SIGILL/SIGTERM handler to revert the terminal back to its original state such that a rvvm crash/^C does not leave the terminal in a non-default state ?

It could be done in src/stacktrace.c, but that thing currently only sets up signal handling if libbacktrace.so is found to print nice stacktraces on crash without a debugger.