Fix broken U-mode programs

JZJisawesome commented 5 months ago

Don't count your chickens before they egg I guess.

Sadly when the kernel tries to execute init, the kernel_execve functions encounters error -14. This may mean EFAULT?

Looking into it, the first time sret is executed is just before this occurs. Assuming no silent corruption, the issue is almost certainly with u-mode or the s-mode u-mode transition.

Dump of info from discord:

9:49 PM
]
JZJ
:
Progress with setting up an initrd with busybox:

IRVE> RVDEBUGADDR: "[    7.635000] Run /init as init process\r\n"
IRVE> RVDEBUGADDR: "[    7.637000]   with arguments:\r\n"
IRVE> RVDEBUGADDR: "[    7.638000]     /init\r\n"
IRVE> RVDEBUGADDR: "[    7.639000]   with environment:\r\n"
IRVE> RVDEBUGADDR: "[    7.640000]     HOME=/\r\n"
IRVE> RVDEBUGADDR: "[    7.641000]     TERM=linux\r\n"
IRVE> RVDEBUGADDR: "[    7.645000] Failed to execute /init (error -2)\r\n"
IRVE> RVDEBUGADDR: "[    7.647000] Run /sbin/init as init process\r\n"
IRVE> RVDEBUGADDR: "[    7.649000]   with arguments:\r\n"
IRVE> RVDEBUGADDR: "[    7.651000]     /sbin/init\r\n"
IRVE> RVDEBUGADDR: "[    7.653000]   with environment:\r\n"
IRVE> RVDEBUGADDR: "[    7.655000]     HOME=/\r\n"
IRVE> RVDEBUGADDR: "[    7.656000]     TERM=linux\r\n"
[...]
IRVE> RVDEBUGADDR: "[    7.675000] Run /bin/init as init process\r\n"
IRVE> RVDEBUGADDR: "[    7.677000]   with arguments:\r\n"
IRVE> RVDEBUGADDR: "[    7.679000]     /bin/init\r\n"
IRVE> RVDEBUGADDR: "[    7.681000]   with environment:\r\n"
IRVE> RVDEBUGADDR: "[    7.683000]     HOME=/\r\n"
IRVE> RVDEBUGADDR: "[    7.685000]     TERM=linux\r\n"
IRVE> RVDEBUGADDR: "[    7.689000] Run /bin/sh as init process\r\n"
IRVE> RVDEBUGADDR: "[    7.691000]   with arguments:\r\n"
IRVE> RVDEBUGADDR: "[    7.693000]     /bin/sh\r\n"
IRVE> RVDEBUGADDR: "[    7.695000]   with environment:\r\n"
IRVE> RVDEBUGADDR: "[    7.697000]     HOME=/\r\n"
IRVE> RVDEBUGADDR: "[    7.699000]     TERM=linux\r\n"
IRVE> RVDEBUGADDR: "[    7.703000] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.\r\n"

[
9:50 PM
]
JZJ
:
So it detects the initrd filesystem but init itself isn't playing nicely...
[
9:55 PM
]
JZJ
:
Ah, forgot to enable elf binary support in the kernel...
[
10:15 PM
]
JZJ
:
Shoot. Qemu does this:

[    0.248184] Run /init as init process
[    0.248746]   with arguments:
[    0.249050]     /init
[    0.249493]   with environment:
[    0.249791]     HOME=/
[    0.250022]     TERM=linux
[    0.805355] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

[
10:15 PM
]
JZJ
:
But IRVE still does the other thing
[
10:15 PM
]
JZJ
:
Looks like there may be issues with user mode...
[
10:24 PM
]
JZJ
:
Init is executed in main.c, which calls kernel_execve which is in in fs/exec.c, and it goes from there...
[
10:25 PM
]
JZJ
:
Now to figure out why it doesn't work
[
10:25 PM
]
JZJ
:
The error is Failed to execute /init (error -14)
[
10:26 PM
]
JZJ
:
Looking it up, it appears 14 is the EFAULT value, so likely this is the error execve is encountering
[
10:27 PM
]
JZJ
:
https://man7.org/linux/man-pages/man2/execve.2.html
[
10:27 PM
]
JZJ
:
Something weird with addresses 
[
10:28 PM
]
JZJ
:
Either a user mode paging issue or some problem with user mode exceptions...
[
10:28 PM
]
JZJ
:
Shoot shoot shoot lol
[
10:49 PM
]
JZJ
:
Note to self: resume tomorrow at exec_binprm
[
10:55 PM
]
JZJ
:
The first time SRET is used is when trying to start init:

82595288>       Mnemonic: SRET
IRVE> RVDEBUGADDR: "[   10.978000] Failed to execute /init (error -14)\r\n"

Assuming it's not a silent corruption issue, the issue is almost certainly with U-mode or the S-mode U-mode transition
[
10:58 PM
]
JZJ
:
Backtrace where the sret occurs:

ret_from_exception () at arch/riscv/kernel/entry.S:167
167             sret
(gdb) bt
#0  ret_from_exception () at arch/riscv/kernel/entry.S:167
#1  0xc00f9f24 in clear_user (n=3387, to=0x25f2c5) at ./arch/riscv/include/asm/uaccess.h:317
#2  padzero (elf_bss=2486981) at fs/binfmt_elf.c:144
#3  load_elf_binary (bprm=0xc088dc00) at fs/binfmt_elf.c:1243
#4  0xc00b7210 in search_binary_handler (bprm=0xc088dc00) at fs/exec.c:1739
#5  exec_binprm (bprm=0xc088dc00) at fs/exec.c:1781
#6  bprm_execve (flags=<optimized out>, filename=<optimized out>, fd=<optimized out>, bprm=0xc088dc00) at fs/exec.c:1856
#7  bprm_execve (bprm=0xc088dc00, fd=<optimized out>, filename=<optimized out>, flags=<optimized out>) at fs/exec.c:1812
#8  0xc00b8224 in kernel_execve (kernel_filename=<optimized out>, argv=argv@entry=0xc04cf000 <argv_init>, envp=envp@entry=0xc04cf088 <envp_init>) at fs/exec.c:2024
#9  0xc020806c in run_init_process (init_filename=<optimized out>) at init/main.c:1355
#10 0xc020f48c in kernel_init (unused=<optimized out>) at init/main.c:1465
#11 0xc0001f24 in ret_from_fork () at arch/riscv/kernel/entry.S:264
Backtrace stopped: frame did not save the PC

[
11:04 PM
]
JZJ
:
Problem: exec_binprm fails and causes bprm_execve to treat it like a segfault

JZJisawesome commented 5 months ago

Tracing it more, the error we encounter is here in fs/binfmt_elf.c:

1235 »···/* Calling set_brk effectively mmaps the pages that we need
1236 »··· * for the bss and break sections.  We must do this before
1237 »··· * mapping in the interpreter, to make sure it doesn't wind
1238 »··· * up getting placed where the bss needs to go.
1239 »··· */
1240 »···retval = set_brk(elf_bss, elf_brk, bss_prot);
1241 »···if (retval)
1242 »···»···goto out_free_dentry;
1243 »···if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
1244 »···»···retval = -EFAULT; /* Nobody gets to see this, but.. */
1245 »···»···goto out_free_dentry;
1246 »···}

That secondary failure occurs and -EFAULT gets returned (I'm 99% certain).

Edit: Looks like this is resulting from the padzero() function call, as a result of clear_user failing for some reason...

JZJisawesome commented 5 months ago

So clear_user ends up calling the arch-specific __clear_user, in our case the RISC-V variant.

This sets the SUM bit in the sstatus csr and then does.. something.

Eventually the first "fixup" instruction is encountered and this causes some sort of pagefault? Not sure what is going on

But somewhere here something goes wrong.

Edit: Likely the issue is to do with some problem with "exception tables," see: https://www.kernel.org/doc/Documentation/x86/exception-tables.txt

We're probably not accurate enough with our exceptions...