Closed ZLangJIT closed 2 months ago
Is this the v0.6 release or v0.7-git staging build?
Please try the latest staging commit, build with make CFLAGS=-g
, then do:
$ gdb --args ./rvvm [rvvm args here]
Upon crash it will drop to GDB shell, please type bt
, and x/10i $pc
and post the results.
Guest firmware/kernel/image would be also helpful, and some info about the host
the host is Android 9 aarch64 termux proot-distro alpine
the build commit is d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d
and the branch is staging
i boot with
rm disk.img
dd if=/dev/zero of=disk.img bs=1M count=100
lldb ../RVVM/debug_BUILD/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline="console=ttyS0 rootflags=discard rw" \
-k disk.img
with uboot from https://github.com/LekKit/RVVM/releases/download/v0.6/fw_payload.bin
with kernel from https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.26/Image.gz
with configuration https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.linuxconfig
the build commit is
d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d
and the branch is
staging
i boot with
rm disk.img dd if=/dev/zero of=disk.img bs=1M count=100 lldb ../RVVM/debug_BUILD/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline="console=ttyS0 rootflags=discard rw" \ -k disk.img
with uboot from https://github.com/LekKit/RVVM/releases/download/v0.6/fw_payload.bin
with kernel from https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.26/Image.gz
with configuration https://github.com/ZLangJIT/riscv-kernel/blob/1e32c71803fd1d1c74debdc73aabaf1eb5311ae8/.linuxconfig
Hello can't reproduce on RVVM 0.6v fw_payload.bin. Its runs perfectly without any problems. Please describe environment (HW specs) and compile & debug by gcc and gdb.
got a Illegal Instruction after 12 atttempted boots + # poweroff
(as i said it either happens or it doesnt)
poweroff
~ # Stopping klogd: OK
Stopping syslogd: OK
Process 5726 stopped
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
frame #0: 0x0000007fb0ce0d1c
-> 0x7fb0ce0d1c: b 0x7fb0d0d6bc
0x7fb0ce0d20: ldr x9, [x0, #0x28]
0x7fb0ce0d24: add x10, x9, #0x8
0x7fb0ce0d28: lsr x12, x10, #12
(lldb) bt
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode * frame #0: 0x0000007fb0ce0d1c
frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16
frame #2: 0x000000300006a9e4 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c0(vm=0x0000007fb1bfb020, insn=30920) at riscv_compressed.h:169:13
frame #3: 0x000000300006a0d4 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=30920) at riscv_compressed.h:594:13
frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=30920) at riscv_interpreter.h:83:5
frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9
frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9
frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9
frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5
frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1
frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1
(lldb) x/10i $pc
-> 0x7fb0ce0d1c: 0x1400b268 unknown b 0x7fb0d0d6bc
0x7fb0ce0d20: 0xf9401409 unknown ldr x9, [x0, #0x28]
0x7fb0ce0d24: 0x9100212a unknown add x10, x9, #0x8
0x7fb0ce0d28: 0xd34cfd4c unknown lsr x12, x10, #12
0x7fb0ce0d2c: 0x92401d8d unknown and x13, x12, #0xff
0x7fb0ce0d30: 0x531b69ad unknown lsl w13, w13, #5
0x7fb0ce0d34: 0x8b0001ad unknown add x13, x13, x0
0x7fb0ce0d38: 0xf9410daf unknown ldr x15, [x13, #0x218] 0x7fb0ce0d3c: 0xca0c01ef unknown eor x15, x15, x12 0x7fb0ce0d40: 0x9240054c unknown and x12, x10, #0x3
(lldb)
after a few more boots (boot to shell, CTRL + C , (lldb) r , y (kill current process and restart ) i get this
Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info : 0 (0x00000000)
Boot HART MIDELEG : 0x0000000000000222
Boot HART MEDELEG : 0x000000000000b109
Process 7707 stopped
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
frame #0: 0x0000007fb0de1280
-> 0x7fb0de1280: b 0x7fb0df63b0
0x7fb0de1284: add x5, x10, #0x8
0x7fb0de1288: lsr x6, x5, #12
0x7fb0de128c: and x7, x6, #0xff
(lldb) bt
* thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode
* frame #0: 0x0000007fb0de1280
frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16
frame #2: 0x000000300006c58c rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c1(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:402:17
frame #3: 0x000000300006bc48 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:598:13
frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=29021) at riscv_interpreter.h:83:5
frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9
frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9
frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9
frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5
frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1
frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1
(lldb) x/10i $pc
-> 0x7fb0de1280: 0x1400544c unknown b 0x7fb0df63b0
0x7fb0de1284: 0x91002145 unknown add x5, x10, #0x8
0x7fb0de1288: 0xd34cfca6 unknown lsr x6, x5, #12
0x7fb0de128c: 0x92401cc7 unknown and x7, x6, #0xff
0x7fb0de1290: 0x531b68e7 unknown lsl w7, w7, #5
0x7fb0de1294: 0x8b0000e7 unknown add x7, x7, x0
0x7fb0de1298: 0xf9410ce8 unknown ldr x8, [x7, #0x218]
0x7fb0de129c: 0xca060108 unknown eor x8, x8, x6
0x7fb0de12a0: 0x924008a6 unknown and x6, x5, #0x7
0x7fb0de12a4: 0xaa0800c6 unknown orr x6, x6, x8
(lldb)
after a few more boots (boot to shell, CTRL + C , (lldb) r , y (kill current process and restart ) i get this
Boot HART PMP Granularity : 0 bits Boot HART PMP Address Bits: 0 Boot HART MHPM Info : 0 (0x00000000) Boot HART MIDELEG : 0x0000000000000222 Boot HART MEDELEG : 0x000000000000b109 Process 7707 stopped * thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode frame #0: 0x0000007fb0de1280 -> 0x7fb0de1280: b 0x7fb0df63b0 0x7fb0de1284: add x5, x10, #0x8 0x7fb0de1288: lsr x6, x5, #12 0x7fb0de128c: and x7, x6, #0xff (lldb) bt * thread #3, name = 'rvvm', stop reason = signal SIGILL: illegal opcode * frame #0: 0x0000007fb0de1280 frame #1: 0x000000300005fab0 rvvm`riscv_jit_tlb_lookup(vm=<unavailable>) at riscv_cpu.c:136:16 frame #2: 0x000000300006c58c rvvm`riscv64_run_interpreter [inlined] riscv_emulate_c_c1(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:402:17 frame #3: 0x000000300006bc48 rvvm`riscv64_run_interpreter [inlined] riscv_emulate_insn(vm=0x0000007fb1bfb020, insn=29021) at riscv_compressed.h:598:13 frame #4: 0x000000300006a074 rvvm`riscv64_run_interpreter [inlined] riscv_emulate(vm=0x0000007fb1bfb020, instruction=29021) at riscv_interpreter.h:83:5 frame #5: 0x0000003000069ffc rvvm`riscv64_run_interpreter(vm=0x0000007fb1bfb020) at riscv_interpreter.h:114:9 frame #6: 0x000000300005f874 rvvm`riscv_run_till_event(vm=0x0000007fb1bfb020) at riscv_cpu.c:26:9 frame #7: 0x0000003000063b84 rvvm`riscv_hart_run(vm=0x0000007fb1bfb020) at riscv_hart.c:334:9 frame #8: 0x0000003000063f38 rvvm`riscv_hart_run_wrap(ptr=0x0000007fb1bfb020) at riscv_hart.c:347:5 frame #9: 0x0000003f00062488 ld-musl-aarch64.so.1 frame #10: 0x0000003f00060838 ld-musl-aarch64.so.1 (lldb) x/10i $pc -> 0x7fb0de1280: 0x1400544c unknown b 0x7fb0df63b0 0x7fb0de1284: 0x91002145 unknown add x5, x10, #0x8 0x7fb0de1288: 0xd34cfca6 unknown lsr x6, x5, #12 0x7fb0de128c: 0x92401cc7 unknown and x7, x6, #0xff 0x7fb0de1290: 0x531b68e7 unknown lsl w7, w7, #5 0x7fb0de1294: 0x8b0000e7 unknown add x7, x7, x0 0x7fb0de1298: 0xf9410ce8 unknown ldr x8, [x7, #0x218] 0x7fb0de129c: 0xca060108 unknown eor x8, x8, x6 0x7fb0de12a0: 0x924008a6 unknown and x6, x5, #0x7 0x7fb0de12a4: 0xaa0800c6 unknown orr x6, x6, x8 (lldb)
Also would be good to get output of x/20i $pc-40 and for address to what it trying to jump.
(lldb) x/20i $pc-40
0x7fb0de1258: 0xf9403c09 unknown ldr x9, [x0, #0x78]
0x7fb0de125c: 0xeb1f013f unknown cmp x9, xzr
0x7fb0de1260: 0x54000121 unknown b.ne 0x7fb0de1284
0x7fb0de1264: 0xf900280a unknown str x10, [x0, #0x50]
0x7fb0de1268: 0xf900380b unknown str x11, [x0, #0x70]
0x7fb0de126c: 0xf900540c unknown str x12, [x0, #0xa8]
0x7fb0de1270: 0xf900600e unknown str x14, [x0, #0xc0]
0x7fb0de1274: 0xf940840f unknown ldr x15, [x0, #0x108]
0x7fb0de1278: 0x9100a9ef unknown add x15, x15, #0x2a
0x7fb0de127c: 0xf900840f unknown str x15, [x0, #0x108]
-> 0x7fb0de1280: 0x1400544c unknown b 0x7fb0df63b0
0x7fb0de1284: 0x91002145 unknown add x5, x10, #0x8
0x7fb0de1288: 0xd34cfca6 unknown lsr x6, x5, #12
0x7fb0de128c: 0x92401cc7 unknown and x7, x6, #0xff
0x7fb0de1290: 0x531b68e7 unknown lsl w7, w7, #5
0x7fb0de1294: 0x8b0000e7 unknown add x7, x7, x0
0x7fb0de1298: 0xf9410ce8 unknown ldr x8, [x7, #0x218]
0x7fb0de129c: 0xca060108 unknown eor x8, x8, x6
0x7fb0de12a0: 0x924008a6 unknown and x6, x5, #0x7
0x7fb0de12a4: 0xaa0800c6 unknown orr x6, x6, x8
(lldb) x/20i 0x7fb0df63b0
0x7fb0df63b0: 0xf940280b unknown ldr x11, [x0, #0x50]
0x7fb0df63b4: 0xaa0b03ec unknown mov x12, x11
0x7fb0df63b8: 0xd34cfd8d unknown lsr x13, x12, #12
0x7fb0df63bc: 0x92401dae unknown and x14, x13, #0xff
0x7fb0df63c0: 0x531b69ce unknown lsl w14, w14, #5
0x7fb0df63c4: 0x8b0001ce unknown add x14, x14, x0
0x7fb0df63c8: 0xf9410dcf unknown ldr x15, [x14, #0x218]
0x7fb0df63cc: 0xca0d01ef unknown eor x15, x15, x13
0x7fb0df63d0: 0x9240098d unknown and x13, x12, #0x7
0x7fb0df63d4: 0xaa0f01ad unknown orr x13, x13, x15
0x7fb0df63d8: 0xb400004d unknown cbz x13, 0x7fb0df63e0
0x7fb0df63dc: 0xd65f03c0 unknown ret
0x7fb0df63e0: 0xf94109cf unknown ldr x15, [x14, #0x210]
0x7fb0df63e4: 0x8b0c01ef unknown add x15, x15, x12
0x7fb0df63e8: 0xf94001ee unknown ldr x14, [x15]
0x7fb0df63ec: 0xf940540f unknown ldr x15, [x0, #0xa8]
0x7fb0df63f0: 0xaa0f03ed unknown mov x13, x15
0x7fb0df63f4: 0xaa0b03ec unknown mov x12, x11
0x7fb0df63f8: 0xaa0e03ea unknown mov x10, x14
0x7fb0df63fc: 0xf9408409 unknown ldr x9, [x0, #0x108]
(lldb) x/20i 0x7fb0df63b0-40
0x7fb0df6388: 0x121c1dce unknown and w14, w14, #0xff0
0x7fb0df638c: 0x8b0001ce unknown add x14, x14, x0
0x7fb0df6390: 0xf9510dcd unknown ldr x13, [x14, #0x2218]
0x7fb0df6394: 0xeb0f01bf unknown cmp x13, x15
0x7fb0df6398: 0x540000a1 unknown b.ne 0x7fb0df63ac
0x7fb0df639c: 0xb940000d unknown ldr w13, [x0]
0x7fb0df63a0: 0x3400006d unknown cbz w13, 0x7fb0df63ac
0x7fb0df63a4: 0xf95109cf unknown ldr x15, [x14, #0x2210]
0x7fb0df63a8: 0xd61f01e0 unknown br x15
0x7fb0df63ac: 0xd65f03c0 unknown ret
0x7fb0df63b0: 0xf940280b unknown ldr x11, [x0, #0x50]
0x7fb0df63b4: 0xaa0b03ec unknown mov x12, x11
0x7fb0df63b8: 0xd34cfd8d unknown lsr x13, x12, #12
0x7fb0df63bc: 0x92401dae unknown and x14, x13, #0xff
0x7fb0df63c0: 0x531b69ce unknown lsl w14, w14, #5
0x7fb0df63c4: 0x8b0001ce unknown add x14, x14, x0 0x7fb0df63c8: 0xf9410dcf unknown ldr x15, [x14, #0x218]
0x7fb0df63cc: 0xca0d01ef unknown eor x15, x15, x13
0x7fb0df63d0: 0x9240098d unknown and x13, x12, #0x7
0x7fb0df63d4: 0xaa0f01ad unknown orr x13, x13, x15
(lldb)
It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.
Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64
https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72
It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.
Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64
https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72
hmm alright
it appears to work so far, i have done many boot-up's and power-off's without problems :)
if it helps at all
~ $ sudo cat /proc/cmdline
console=ram loglevel=4 sec_debug.level=0 sec_watchdog.sec_pet=5 androidboot.debug_level=0x4f4c androidboot.dram_info=01,12,00,4G androidboot.ap_serial=0x010C4FC44ADA sec_debug.chipidfail_cnt=0 sec_debug.lpitimeout_cnt=0 sec_debug.cache_err_cnt=0 sec_debug.codediff_cnt=1 sec_debug.pcb_offset=7343872 sec_debug.smd_offset=7344896 sec_debug.lpddr4_size=4 sec_debug.sjl=1 androidboot.prototype.param.offset=7345920 ess_setup=0x91200000 tima_log=0x200000@0xb1000000 sec_avc_log=0x40000@0x92202000 sec_tsp_log=0x40000@0x92244000 sec_debug.base=0x100000@0x92286000 auto_summary_log=0x10000@0x92388000 charging_mode=0x3030 s3cfb.bootloaderfb=0xcc000000 lcdtype=13713429 androidboot.carrierid.param.offset=7340608 androidboot.carrierid=XSA consoleblank=0 vmalloc=384m sec_debug.reset_reason=7 sec_reset.reset_reason=7 ehci_hcd.park=3 oops=panic pmic_info=43 ccic_info=1 fg_reset=0 androidboot.emmc_checksum=3 androidboot.sales.param.offset=7340572 sales_code=XSA androidboot.odin_download=1 androidboot.bootloader=G950FXXSBDTJ1 androidboot.selinux=enforcing androidboot.security_mode=1526595585 androidboot.ucs_mode=0 kaslr_region=0x1000@0x80001000 androidboot.revision=10 androidboot.hardware=samsungexynos8895 androidboot.warranty_bit=1 androidboot.wb.hs=0000 sec_debug.bin=A androidboot.hmac_mismatch=0 androidboot.sec_atd.tty=/dev/ttySAC0 androidboot.serialno=ce091829e258a11b04 snd_soc_core.pmdown_time=1000 androidboot.cp_reserved_mem=off nohugeiomap androidboot.fmp_config=0 androidboot.em.did=010c4fc44ada androidboot.em.model=SM-G950F androidboot.em.status=0x0 androidboot.verifiedbootstate=orange bcm_setup=0xffffff80f8e00000 reserve-fimc=0xffffff80fa000000 firmware_class.path=/vendor/firmware region1=EUR region2=OPEN
~ $ sudo cat /proc/cpuinfo
processor : 0
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 1
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 2
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 3
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 4
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 0
processor : 5
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 0
processor : 6
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 0
processor : 7
BogoMIPS : 52.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x4
CPU part : 0x001
CPU revision : 0
~ $ uname -a
Linux localhost 4.4.111-ge843fcf8661e #1 SMP PREEMPT Wed Sep 25 19:01:25 UTC 2019 aarch64 Android
~ $
Samsung Galaxy S8 Lineage OS Pie
it appears to work so far, i have done many boot-up's and power-off's without problems :)
tho im not sure if this would be device specific or not (ei disabling it may have adverse effects for other arm64 platforms/devices and i do not have the capability to test on an array of arm64 hardware)
if it helps at all
...
specifically a Samsung Exynos 8895 cpu
It could be an instruction cache sync issue, but that would be very weird since arm64 backend works on many other phones & Mac M1. Branches specifically are frequently patched, but the icache is flushed afterwards.
Could you please comment this line (For aarch64!) in a staging tree and rebuild & retry? This will disable branch-patching JIT block linker on arm64
https://github.com/LekKit/RVVM/blob/d1e01a50109c6b2d217f4861fceb7fb2c13bbd7d/src/rvjit/rvjit.h#L72
Maybe its a big.LITTLE trouble, mono has a some workaround for that:
specifically a Samsung Exynos 8895 cpu
Indeed that helps a lot. This CPU has different cache line sizes on different cores, and yet the android libc is not prepared to handle that. An errata (cpu-specific bugfix for icache flushing) needs to be implemented on RVJIT side.
Disabling RVJIT_NATIVE_LINKER
has a side effect that icache flushes are not done in a way to trigger this, but in general results in less JIT optimizations.
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal build works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
Uh is the -I
compiler argument not working?
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
Uh is the
-I
compiler argument not working?
im using cmake
it doesnt seem to handle h
files in source blocks in ndk cmake/llvm
specifically a Samsung Exynos 8895 cpu
Indeed that helps a lot. This CPU has different cache line sizes on different cores, and yet the android libc is not prepared to handle that. An errata (cpu-specific bugfix for icache flushing) needs to be implemented on RVJIT side.
Disabling
RVJIT_NATIVE_LINKER
has a side effect that icache flushes are not done in a way to trigger this, but in general results in less JIT optimizations.
not quite sure how to go about implementing a fix for this, especially with
It can happen that a process gets scheduled on a different CPU while executing the __clear_cache function with a certain cache line size, where it might not be valid anymore
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing
im using alpine aarch64 proot with (proot-distro)
localhost:~/riscv-kernel/libmedia# cmake --version
cmake version 3.30.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
localhost:~/riscv-kernel/libmedia# clang --version
Alpine clang version 18.1.8
Target: aarch64-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/bin
Configuration file: /etc/clang18/aarch64-alpine-linux-musl.cfg
localhost:~/riscv-kernel/libmedia#
in github archlinux CI im using sdkmanager cmake and ndk 21 which has llvm 14 i think
specifically CMake 3.10.2.4988404 (revision: 3.10.2) and ndk 21.4.7075529
Well RVVM should be able to work in Termux as well as natively on Android. Makefile supports building with NDK (just need to pass make CC=/path/to/ndk/clang
). I am not sure what implications proot has, but it's effectively more layers of abstractions that are ultimately not needed specifically for RVVM.
Also see https://github.com/fish4terrisa-MSDSM/archriscv-term
It is desirable to have a dedicated Android app with working graphics in the future; librvvm JNI bindings can be used from Java to create/manipulate/access virtual machines already, but full Android app is not implemented yet
archriscv-term currently can be used as a third party Android app around RVVM, which also not only runs Arch but any other distro/OS that RVVM already supports
It is desirable to have a dedicated Android app with working graphics in the future; librvvm JNI bindings can be used from Java to create/manipulate/access virtual machines already, but full Android app is not implemented yet
yea, so far i have a simple rvvm terminal application
https://github.com/ZLangJIT/riscv-kernel/releases/download/6.11.30/linux.kernel.rvvm.debug.apk (aarch64)
(in Log/Terminal - swipe from the left edge to bring up the drawer, like in termux, then cd ASSETS ; ./boot_rvvm_disk.sh
)
btw android's dd
does hot accept -h
nor -help
nor --help
and its bs=
doesnt seem to accept the M suffix
one challenge would however be accelerated 3d graphics since android disallows KVM
3D graphics also need work on VM side, such as virtio-gpu implementation. However software rendering is already working well everywhere, just need a way to render a raw framebuffer on android.
As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files
It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.
To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...
3D graphics also need work on VM side, such as virtio-gpu implementation. However software rendering is already working well everywhere, just need a way to render a raw framebuffer on android.
yup
As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files
It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.
To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...
true, tho we could attempt to identify if we are running on a ARM64 android device and then use 64 kb cache always, without disturbing other devices such as Apple M1 or Arm64 laptop/surface tablets
hmm, im not sure if the following would work on android (not sure how to verify a task/pid is not scheduled on a different core)
https://android.googlesource.com/platform/art/+/main/runtime/jit/jit_code_cache.cc
https://android.googlesource.com/platform/external/toybox/+/7a3f53b/toys/other/taskset.c
u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $ taskset --help
usage: taskset [-ap] [mask] [PID | cmd [args...]]
Launch a new task which may only run on certain processors, or change
the processor affinity of an exisitng PID.
Mask is a hex string where each bit represents a processor the process
is allowed to run on. PID without a mask displays existing affinity.
-p Set/get the affinity of given PID instead of a new command
-a Set/get the affinity of all threads of the PID
u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $ taskset -p $$
pid 30855's current affinity mask: ff
u0_a133@dreamlte /data/data/linux.kernel/files/ASSETS $
As for the issue in the title... Seems that the fix proposed by the Mono project & adopted by a few others is incomplete: https://github.com/mono/mono/pull/3549/files
It promises to determine the smallest cache line size across all cores, but for that to actually work it needs to be called on a core with the smallest cacheline size at least once. In reality there is still a chance for it to fetch the higher cacheline value and begin flushing, only for it to be rescheduled in this short timeslice to another core and miserably fail as a result.
To properly figure the smallest cacheline size without caveats, some kernel assistance is needed, which is not available on those devices from 2016. Another possible fix is to hardcode 64byte cacheline size - but it will pessimize the performance on uniform 128byte cacheline cores, such as the Apple M1...
for now i think it would be best to hardcode 64 when compiling for big.LITTLE devices
tho i am not sure where to do this in RVVM sources
for now i think it would be best to hardcode 64 when compiling for big.LITTLE devicesho
Well unfortunately it knows nothing about that, at least not in a portable way if we assume it ever runs on an Android device with a very old kernel.
Please re-test with latest staging (e09e7dc) - this should fix your issue, and didn't bring up any regressions on ARM64 hardware tested so far (Mac M1 and Ampere). Reopen if this is still an issue.
Please re-test with latest staging (e09e7dc) - this should fix your issue, and didn't bring up any regressions on ARM64 hardware tested so far (Mac M1 and Ampere). Reopen if this is still an issue.
alright
unfortunately i still get SIGILL
localhost:~/riscv-kernel# ./boot_rvvm_disk.sh
removed 'disk.img'
100+0 records in
100+0 records out
104857600 bytes (100.0MB) copied, 0.115777 seconds, 863.7MB/s
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./rvvm...
(gdb)
Starting program: /root/riscv-kernel/rvvm ../RVVM/uboot -v -k Image -m 100m -cmdline=console=ttyS0\ rootflags=discard\ rw\ -k disk.img
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
INFO: Attached MMIO device at 0x02000000, type "aclint_mswi"
INFO: Attached MMIO device at 0x02004000, type "aclint_mtimer"
INFO: Attached MMIO device at 0x0c000000, type "plic"
INFO: Attached MMIO device at 0x30000000, type "pci_bus"
INFO: Attached MMIO device at 0x10030000, type "i2c_opencores"
INFO: Attached MMIO device at 0x00101000, type "rtc_goldfish"
INFO: Attached MMIO device at 0x00100000, type "syscon"
INFO: Attached MMIO device at 0x10000000, type "ns16550a"
ERROR: No suitable windowing backends found!
[New LWP 6369]
INFO: Attached MMIO device at 0x40000000, type "rtl8169"
INFO: Generated DTB at 0x863fee10, size 4592
[New LWP 6370]
INFO: Dropping from root user to nobody
[New LWP 6371]
INFO: Hart 0x7fb7832000 started
[LWP 6371 exited]
OpenSBI v1.4
____ _____ ____ _____
/ __ \ / ____| _ \_ _|
| | | |_ __ ___ _ __ | (___ | |_) || |
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
| |__| | |_) | __/ | | |____) | |_) || |_
\____/| .__/ \___|_| |_|_____/|____/_____|
| |
|_|
Platform Name : RVVM v0.7-b3e9533b-dirty
Platform Features : medeleg
Platform HART Count : 1
Platform IPI Device : aclint-mswi
Platform Timer Device : aclint-mtimer @ 10000000Hz
Platform Console Device : uart8250
Platform HSM Device : ---
Platform PMU Device : ---
Platform Reboot Device : syscon-reboot
Platform Shutdown Device : syscon-poweroff
Platform Suspend Device : ---
Platform CPPC Device : ---
Firmware Base : 0x80000000
Firmware Size : 191 KB
Firmware RW Offset : 0x20000
Firmware RW Size : 63 KB
Firmware Heap Offset : 0x27000
Firmware Heap Size : 35 KB (total), 2 KB (reserved), 9 KB (used), 23 KB (free)
Firmware Scratch Size : 4096 B (total), 328 B (used), 3768 B (free)
Runtime SBI Version : 2.0
Domain0 Name : root
Domain0 Boot HART : 0
Domain0 HARTs : 0*
Domain0 Region00 : 0x0000000000100000-0x0000000000100fff M: (I,R,W) S/U: (R,W)
Domain0 Region01 : 0x0000000010000000-0x0000000010000fff M: (I,R,W) S/U: (R,W)
Domain0 Region02 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: ()
Domain0 Region03 : 0x0000000080020000-0x000000008002ffff M: (R,W) S/U: ()
Domain0 Region04 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region05 : 0x000000000c000000-0x000000000fffffff M: (I,R,W) S/U: (R,W)
Domain0 Region06 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address : 0x0000000080200000
Domain0 Next Arg1 : 0x0000000080100000
Domain0 Next Mode : S-mode
Domain0 SysReset : yes
Domain0 SysSuspend : yes
Boot HART ID : 0
Boot HART Domain : root
Boot HART Priv Version : v1.12
Boot HART Base ISA : rv64imafdcb
Boot HART ISA Extensions : sstc,zicntr,zkr,zicboz,zicbom
Boot HART PMP Count : 0
Boot HART PMP Granularity : 0 bits
Boot HART PMP Address Bits: 0
Boot HART MHPM Info : 0 (0x00000000)
Boot HART MIDELEG : 0x0000000000000222
Boot HART MEDELEG : 0x000000000000b109
Thread 3 "rvvm" received signal SIGILL, Illegal instruction.
[Switching to LWP 6370]
0x0000007fb00e1688 in ?? ()
(gdb)
#0 0x0000007fb00e1688 in ?? ()
#1 0x0000007fb783524c in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
(gdb)
=> 0x7fb00e1688: b 0x7fb00e3cb0
0x7fb00e168c: ldr x10, [x0, #80]
0x7fb00e1690: mov x11, x10
0x7fb00e1694: lsr x12, x11, #12
0x7fb00e1698: and x13, x12, #0xff
0x7fb00e169c: lsl w13, w13, #5
0x7fb00e16a0: add x13, x13, x0
0x7fb00e16a4: ldr x14, [x13, #536]
0x7fb00e16a8: eor x14, x14, x12
0x7fb00e16ac: and x12, x11, #0x7
0x7fb00e16b0: orr x12, x12, x14
0x7fb00e16b4: cbz x12, 0x7fb00e16c8
0x7fb00e16b8: ldr x15, [x0, #264]
0x7fb00e16bc: add x15, x15, #0x2
0x7fb00e16c0: str x15, [x0, #264]
0x7fb00e16c4: ret
0x7fb00e16c8: ldr x14, [x13, #528]
0x7fb00e16cc: add x14, x14, x11
0x7fb00e16d0: ldr x15, [x14]
0x7fb00e16d4: ldr x14, [x0, #168]
(gdb)
could we add a cmake/rvvm option to disable the JIT native linker globally for builds that will be executed on devices that end up exibiting such SIGILL flushes ?
Also migrating from cmake 1.6 / 1.10 to cmake 1.18 fixed the following
btw i converted all "" includes into fully relative paths due to ndk bugs (a normal buils works in alpine proot (alpine aarch64 llvm) but fails on x86 (CI) ndk llvm (ndk 21 specifically)
could you please build it from scratch? what proot method are you using exactly? termux? andronix? make sure to build for your platform directly as proot is very finicky and unreliable due to text parsing
im using alpine aarch64 proot with (proot-distro)
localhost:~/riscv-kernel/libmedia# cmake --version cmake version 3.30.2 CMake suite maintained and supported by Kitware (kitware.com/cmake). localhost:~/riscv-kernel/libmedia# clang --version Alpine clang version 18.1.8 Target: aarch64-alpine-linux-musl Thread model: posix InstalledDir: /usr/bin Configuration file: /etc/clang18/aarch64-alpine-linux-musl.cfg localhost:~/riscv-kernel/libmedia#
in github archlinux CI im using sdkmanager cmake and ndk 21 which has llvm 14 i think
specifically CMake 3.10.2.4988404 (revision: 3.10.2) and ndk 21.4.7075529
RVVM v0.7-b3e9533b-dirty
What commit is that, why is it dirty?
You can also try lowering both dsize and isize variables to value 32 here and see if it helps: https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52
RVVM v0.7-b3e9533b-dirty
What commit is that, why is it dirty?
You can also try lowering both dsize and isize variables to value 32 here and see if it helps:
https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52
was avoiding storing the .git i now upload it with .git as .git0 and move it back to .git in the CI
RVVM v0.7-b3e9533b-dirty
What commit is that, why is it dirty?
You can also try lowering both dsize and isize variables to value 32 here and see if it helps:
https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52
using 32 seems to work
could we also add an option to specify the iflush size on such SIGILL flushes (with an additional option to disable the iflush cache / native linker as a fallback if the cache size is unknown to the user or they just dont wanna deal with trying to figure out the cache size for various devices/hardware configurations)
btw does data size have to equal instruction size or can they differ ?
if so could we add options to set each?
eg
rvvm --dsize 32 --isize 32 # custom cache flush size, ARM only
rvvm --disable-native-linker # disable native linker, ARM only
RVVM v0.7-b3e9533b-dirty
What commit is that, why is it dirty? You can also try lowering both dsize and isize variables to value 32 here and see if it helps: https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52
using 32 seems to work
could we also add an option to specify the iflush size on such SIGILL flushes (with an additional option to disable the iflush cache / native linker as a fallback if the cache size is unknown to the user or they just dont wanna deal with trying to figure out the cache size for various devices/hardware configurations)
btw does data size have to equal instruction size or can they differ ?
if so could we add options to set each?
eg
rvvm --dsize 32 --isize 32 # custom cache flush size, ARM only rvvm --disable-native-linker # disable native linker, ARM only
spoke too soon, 32 also fails with SIGILL
RVVM v0.7-b3e9533b-dirty
What commit is that, why is it dirty?
You can also try lowering both dsize and isize variables to value 32 here and see if it helps:
https://github.com/LekKit/RVVM/blob/f4031a4f7860cdfd37ecf8b94d1a9d607960efb5/src/rvjit/rvjit.c#L52
hmm it appears if we get added as a cmake subproject we appear to derive the head as-if from the parent repo ... (as we should otherwise get a commit of f4031a4
)
ok i created a patch to disable the native linker via -rvjit_disable_native_linker
https://github.com/ZLangJIT/riscv-kernel/blob/cfc74e807c1ec448b25a6d5e7790cf8c6befcdc7/rvvm.patch
spoke too soon, 32 also fails with SIGILL
I am not sure how to fix this at all then. I have an idea that lazy linker patch flushes might be causing this, but it would mean that patchpoints somehow end up spilled between cachelines which didn't ever happen on any ARM64 CPU yet.
It is possible that this patch will help, but I am not sure. Revert to a clean staging tree beforehand.
From b6381de3b34ca65e1a774307cd23c6da1e1697b4 Mon Sep 17 00:00:00 2001
From: LekKit <50500857+LekKit@users.noreply.github.com>
Date: Mon, 30 Sep 2024 21:07:48 +0300
Subject: [PATCH] rvjit: Flush icache on linker patchpoints
---
src/rvjit/rvjit.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/rvjit/rvjit.c b/src/rvjit/rvjit.c
index 8ad74f8..e90983b 100644
--- a/src/rvjit/rvjit.c
+++ b/src/rvjit/rvjit.c
@@ -41,6 +41,8 @@ void sys_icache_invalidate(void* start, size_t len);
#include <sys/syscall.h>
#include <unistd.h>
+#define RVJIT_GLOBAL_ICACHE_FLUSH
+
#elif defined(RVJIT_ARM64) && defined(GNU_EXTS)
/*
* Don't rely on GCC's __clear_cache implementation, as it may
@@ -263,6 +265,9 @@ rvjit_func_t rvjit_block_finalize(rvjit_block_t* block)
vector_foreach(*linked_blocks, i) {
uint8_t* jptr = vector_at(*linked_blocks, i);
rvjit_linker_patch_jmp(jptr, ((size_t)dest) - ((size_t)jptr));
+#ifndef RVJIT_GLOBAL_ICACHE_FLUSH
+ rvjit_flush_icache(jptr, 8);
+#endif
}
vector_free(*linked_blocks);
free(linked_blocks);
--
2.46.2
flushing icache on linker patchpoints works :)
tho i would like to keep the -rvjit_disable_native_linker
as a fallback in case this ever fails again
or at least make it a optional config ( RVVM_NO_NATIVE_LINKER
) should the rvvm_has_args
become a perf hit
or at least make it a optional config (
RVVM_NO_NATIVE_LINKER
) should thervvm_has_args
become a perf hit
There will be a special useflag in the refactored Makefile that is currently being worked on, but really it's more of a workaround than anything else. Seems that the patchpoint flushing is the way to go forward.
Actually I have another patch that tries to fix it differently, which I'm interested to test too... (revert the previous patch or clean the source tree)
From e0745f4f9223f3afc91284e084537d6e572f4c68 Mon Sep 17 00:00:00 2001
From: LekKit <50500857+LekKit@users.noreply.github.com>
Date: Tue, 1 Oct 2024 00:17:12 +0300
Subject: [PATCH] rvjit_arm64: Omit zeroing patched jump
---
src/rvjit/rvjit_arm64.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/rvjit/rvjit_arm64.h b/src/rvjit/rvjit_arm64.h
index ed01f45..ea2b6bb 100644
--- a/src/rvjit/rvjit_arm64.h
+++ b/src/rvjit/rvjit_arm64.h
@@ -1343,7 +1343,6 @@ static inline void rvjit_patch_ret(void* addr)
static inline bool rvjit_patch_jmp(void* addr, int32_t offset)
{
if (rvjit_a64_valid_reloc(offset)) {
- write_uint32_le_m(addr, 0);
rvjit_a64_b_reloc(addr, offset);
return true;
}
--
2.46.2
I'm just trying to get an idea what that specific Exynos CPU model does differently from other CPUs we tested on
Actually I have another patch that tries to fix it differently, which I'm interested to test too... (revert the previous patch or clean the source tree)
From e0745f4f9223f3afc91284e084537d6e572f4c68 Mon Sep 17 00:00:00 2001 From: LekKit <50500857+LekKit@users.noreply.github.com> Date: Tue, 1 Oct 2024 00:17:12 +0300 Subject: [PATCH] rvjit_arm64: Omit zeroing patched jump --- src/rvjit/rvjit_arm64.h | 1 - 1 file changed, 1 deletion(-) diff --git a/src/rvjit/rvjit_arm64.h b/src/rvjit/rvjit_arm64.h index ed01f45..ea2b6bb 100644 --- a/src/rvjit/rvjit_arm64.h +++ b/src/rvjit/rvjit_arm64.h @@ -1343,7 +1343,6 @@ static inline void rvjit_patch_ret(void* addr) static inline bool rvjit_patch_jmp(void* addr, int32_t offset) { if (rvjit_a64_valid_reloc(offset)) { - write_uint32_le_m(addr, 0); rvjit_a64_b_reloc(addr, offset); return true; } -- 2.46.2
the above patch didnt seem to work, we still get SIGILL
I see. Well anyways thanks for testing
on a side note, could we install a SIGSEGV/SIGBUS/SIGILL/etc/SIGTERM (and whatever win32 equiv is) handler to revert the terminal back to its original state before exiting such that a rvvm crash/^C ( win32 ^Z ) does not leave the terminal in a non-default state ?
on a side note, could we install a SIGILL/SIGTERM handler to revert the terminal back to its original state such that a rvvm crash/^C does not leave the terminal in a non-default state ?
It could be done in src/stacktrace.c
, but that thing currently only sets up signal handling if libbacktrace.so
is found to print nice stacktraces on crash without a debugger.
what could i do upon the JIT attempting to execute an Illegal Instruction (happens sometimes, never happens with -nojit)
as such seems to prevent the console from being restored (eg ^C functionality) upon
Illegal Instruction
being encountered, as well as prevent the rvvm from, say auto rebooting if such is encountered during an attempted kernel bootexamples:
these are about 50/50, you either get them or you dont