Closed JZJisawesome closed 4 months ago
Tracing it more, the error we encounter is here in fs/binfmt_elf.c
:
1235 »···/* Calling set_brk effectively mmaps the pages that we need
1236 »··· * for the bss and break sections. We must do this before
1237 »··· * mapping in the interpreter, to make sure it doesn't wind
1238 »··· * up getting placed where the bss needs to go.
1239 »··· */
1240 »···retval = set_brk(elf_bss, elf_brk, bss_prot);
1241 »···if (retval)
1242 »···»···goto out_free_dentry;
1243 »···if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
1244 »···»···retval = -EFAULT; /* Nobody gets to see this, but.. */
1245 »···»···goto out_free_dentry;
1246 »···}
That secondary failure occurs and -EFAULT gets returned (I'm 99% certain).
Edit: Looks like this is resulting from the padzero() function call, as a result of clear_user failing for some reason...
So clear_user
ends up calling the arch-specific __clear_user
, in our case the RISC-V variant.
This sets the SUM bit in the sstatus
csr and then does.. something.
Eventually the first "fixup" instruction is encountered and this causes some sort of pagefault? Not sure what is going on
But somewhere here something goes wrong.
Edit: Likely the issue is to do with some problem with "exception tables," see: https://www.kernel.org/doc/Documentation/x86/exception-tables.txt
We're probably not accurate enough with our exceptions...
Posibility: We don't implement stval, it is just always zero.
Does the kernel depend on us to tell it the faulting virtual address?
After the fault, after saving the registers and reading several CSRs, we end up at arch/riscv/mm/fault.c in the handle_page_fault function.
Interesting:
(gdb) print *regs
$4 = {epc = 3223355260, ra = 3222249252, sp = 3229785728, gp = 3226760904, tp = 3229745152, t0 = 2486984, t1 = 2490368, t2 = 3229659908, s0 = 3230194688, s1 = 3238116608, a0 = 2486981,
a1 = 3387, a2 = 2642408133, a3 = 2490368, a4 = 4096, a5 = 709, a6 = 3229660416, a7 = 2, s2 = 0, s3 = 0, s4 = 66644, s5 = 2486981, s6 = 2373708, s7 = 2642411520, s8 = 2486981,
s9 = 2503300, s10 = 0, s11 = 0, t3 = 2, t4 = 3229785648, t5 = 2, t6 = 262144, status = 262432, badaddr = 0, cause = 15, orig_a0 = 3222247388}
The "badaddr" is 0 which is the value from stval, not the actual address in virtual memory of where the store faulted too (which was 0x0002somthing)
Maybe it's as simple as that! I sure as hell hope so!
99% confirmed: it appears the function calls other functions and they continue to run with this incorrect 0 addr.
I'll attempt to implement stvec now, and hopefully this will resolve the issue.
Yep, another kernel bug! This fixed it! Now IRVE prints:
IRVE> RVDEBUGADDR: "[ 7.058000] Warning: unable to open an initial console.\r\n"
IRVE> RVDEBUGADDR: "[ 7.142000] Freeing unused kernel image (initmem) memory: 2396K\r\n"
IRVE> RVDEBUGADDR: "[ 7.144000] Kernel memory protection not selected by kernel config.\r\n"
IRVE> RVDEBUGADDR: "[ 7.147000] Run /init as init process\r\n"
IRVE> RVDEBUGADDR: "[ 7.149000] with arguments:\r\n"
IRVE> RVDEBUGADDR: "[ 7.150000] /init\r\n"
IRVE> RVDEBUGADDR: "[ 7.151000] with environment:\r\n"
IRVE> RVDEBUGADDR: "[ 7.152000] HOME=/\r\n"
IRVE> RVDEBUGADDR: "[ 7.153000] TERM=linux\r\n"
IRVE> RVDEBUGADDR: "[ 10.245000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000\r\n"
IRVE> RVDEBUGADDR: "[ 10.249000] CPU: 0 PID: 1 Comm: init Not tainted 6.6.0+ #15\r\n"
IRVE> RVDEBUGADDR: "[ 10.252000] Call Trace:\r\n"
IRVE> RVDEBUGADDR: "[ 10.254000] [<c000337c>] walk_stackframe+0x0/0xc0\r\n"
IRVE> RVDEBUGADDR: "[ 10.257000] [<c020e260>] dump_stack_lvl+0x28/0x48\r\n"
IRVE> RVDEBUGADDR: "[ 10.260000] [<c0208494>] panic+0x118/0x2e0\r\n"
IRVE> RVDEBUGADDR: "[ 10.263000] [<c000ba88>] do_exit+0x724/0x728\r\n"
IRVE> RVDEBUGADDR: "[ 10.266000] [<c000bc40>] do_group_exit+0x24/0x7c\r\n"
IRVE> RVDEBUGADDR: "[ 10.269000] [<c000bcb4>] __riscv_sys_exit_group+0x1c/0x20\r\n"
IRVE> RVDEBUGADDR: "[ 10.272000] [<c020ede8>] do_trap_ecall_u+0x58/0x114\r\n"
IRVE> RVDEBUGADDR: "[ 10.275000] [<c0001e64>] _save_context+0x104/0x108\r\n"
IRVE> RVDEBUGADDR: "[ 10.278000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 ]---\r\n"
Fixed in the latest commit on my branch. Issue will be closed when my branch is next merged to staging.
Don't count your chickens before they egg I guess.
Sadly when the kernel tries to execute init, the
kernel_execve
functions encounters error -14. This may meanEFAULT
?Looking into it, the first time
sret
is executed is just before this occurs. Assuming no silent corruption, the issue is almost certainly with u-mode or the s-mode u-mode transition.Dump of info from discord: