[Question] Ubuntu installation on LiteX SoC?

OrkunAliOzkan commented 1 year ago

Hello,

My goal is to run replicate the progress made with hosting Buildroot/Debian/Fedora onto an FPGA but with Ubuntu instead. Is there any support planned for Ubuntu RISC-V ports instead of Buildroot Linux? What would need to be accomplished to allow these to run properly?

I initially attempted to pivot root into an ubuntu environment stored within an SDCard using busybox

exec switch_root /mnt/ubuntu /bin/systemd

however no console output would be made. Farthest systemd intialisation have reached was random number generator completed:

random: fast/crng init done

I reviewed documentation on booting Fedora on LiteX SoC and am confident that LiteX's kernel fork is capable of supporting Ubuntu. I am attempting to make use of it now, I will update if I make any progress

All the best, Orkun

Notes:

To get the Ubuntu environment from the image: Ubuntu image

qemu-img convert image.img image.raw
losetup -f -P --show image.raw 
mount -o loop /dev/loop##p1 mnt

OrkunAliOzkan commented 1 year ago

I have built a kernel image and an initrd which I side-load into the SoCs memory from SD Card, which the bootloader invokes. Please read the following guide to see my steps taken (Yes I know the tut is for Fedora but both use systemd as init process so should be similar enough, maybe this is nativity).

The kernel panic which I encounter is a bit beyond me, though I did research some of the key errors, I would like to ask for some support.

Please find attached bellow the console log, SoC device tree, as well as the kernel configuration file which I am using.

gsomlo commented 1 year ago

On Mon, Jun 26, 2023 at 07:46:27AM -0700, Orkun Ozkan wrote:

The kernel panic which I encounter is a bit beyond me, though I did research some of the key errors, I would like to ask for some support.

I kept getting similar errors, and traced them back to having asked opensbi to load the DTB at a physical memory address that slightly overalpped with the end of the kernel. I was following the fw_jump.md instructions in opensbi, which used to be wrong. Since then I've asked them to fix those instructions:

https://github.com/riscv-software-src/opensbi/commit/ee016a7bb098578a5d0d4bde01259fe3cd57b02f

Hopefully if you rebuild opensbi by telling it your kernel is a bit larger and to place its own DTB copy at an appropriately higher memory address, your problem will be solved.

I'm not 100% sure we're running into the same problem, but if so, then following the linked instructions should help.

Best, --Gabriel

OrkunAliOzkan commented 1 year ago

Hi Gabriel! Great to hear from you, thank you so much for pointing this out!

Due to your suggestions, I have been able to reach the user space initialisation stage. I encountered a new kernel panic, where the init process is not what was expected (?) and is failing to execute:

[  355.371946] Starting init: /bin/sh exists but couldn't execute it (error -8)
[  355.389742] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.

This could be due to the initramfs not being properly made. I did as you did in your Fedora boot documentation and used the initramfs generated in /boot during the kernel generation process.

Two notable issues I saw in the console log during generation that I just wanted to briefly ask about are:

1)

[  310.343618] request_module: modprobe binfmt-464c cannot be processed, kmod busy with 50 threads for more than 5 seconds now

I did some research on this and it seems like a common message recieved when one uses a 64-bit user-land with a 32-bit kernel (or vice versa) but I have built the kernel to be 64-bit (CONFIG_64BIT=y in .config), and I know the userspace is 64 bit since the cloud image is made for 64 bit RISCV, so I am not too sure if this is the issue. Have you possibly encountered this before?

2)

[  243.135660] INFO: task kworker/u2:0:46 blocked for more than 120 seconds.
[  243.142482]       Not tainted 6.4.0-rc6 #5
[  243.145538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.154232] task:kworker/u2:0    state:D stack:0     pid:46    ppid:10     flags:0x00000000
[  243.162338] Call Trace:
[  243.164230] [<ffffffff80b3cb54>] __schedule+0x332/0x992
[  243.170526] [<ffffffff80b3d200>] schedule+0x4c/0xdc
[  243.175278] [<ffffffff80044512>] async_synchronize_cookie_domain+0xfa/0x136
[  243.182550] [<ffffffff80002f72>] wait_for_initramfs+0x4a/0x72
[  243.188028] [<ffffffff800334f6>] call_usermodehelper_exec_async+0x12e/0x208
[  243.194998] [<ffffffff8000404e>] ret_from_fork+0xe/0x20

After some added research it may be that the INFO message happens because IO is under heavy load. I am under the impression that I could resolve this by introducing a swap partition on the SD Card (I'll get back to you on this tomorrow since I just thought this now). I am unsure about the call trace however.

Sadly I have to leave it for today, it is late, but I'll provide an update tomorrow

Please find attached the new console log output if you would like to have a read.

Thanks again

gsomlo commented 1 year ago

On Mon, Jun 26, 2023 at 10:41:14AM -0700, Orkun Ozkan wrote:

This could be due to the initramfs not being properly made. I did as you did in your Fedora boot documentation and used the initramfs generated in /boot during the kernel generation process.

I built a kernel based on fedora's own .config, on a Fedora riscv VM -- specifically to avoid having to figure out all the details on how it actually goes about creating the initramfs (I think it invokes dracut at some point, which IIRC is a fedora-specific thing).

How it all works on Ubuntu is a bit even further outside my area of expertise, but I'd first try to load the newly built kernel on the same VM on which I've built it, and make sure it works there before trying to troubleshoot LiteX specific problems.

After some added research it may be that the INFO message happens because IO is under heavy load. I am under the impression that I could resolve this by introducing a swap partition on the SD Card (I'll get back to you on this tomorrow since I just thought this now). I am unsure about the call trace however.

Fedora (by way of systemd) absolutely insists on creating a zram based swap partition, so there's no way to boot it without one... Swapping to the sdcard sounds like it might be a bit slow, I think it's technically possible but I'd avoid it unless absolutely necessary... :)

Good luck, --Gabriel

OrkunAliOzkan commented 1 year ago

Resolved this by:

1) Installing custom kernel on Riscv Ubuntu (qemu) 2) Extract modified Ubuntu environment from .raw file

qemu-img convert image.img image.raw
sudo losetup -f -P --show image.raw 
sudo mount -o loop /dev/whateverp1 mnt

3) Making my own initrd. Previously I attempted to install Busybox and then run switch_root as PID 1, an issue I faced was that Busybox ash could not be PID 1, hence invoking exec to force a process to be PID 1 would not behave as desired. To avoid this issue I made my init process a script which ran busybox ash as PID 1 and then mounted the SD Card containing the Ubuntu environment, and then invoked switch_root as PID 1, the outcome of this was that Ubuntu was sucessfully switch rooted into, and systemd was initialised, resulting in Ubuntu running!

Word of advice for the init script is to make sure that virtual mount-points /dev, /proc, /tmp and /sys exist during mounting the SD Card containing the / partition, and are appropriately removed (unmount all of them other than /dev, move /dev to the / partition containing Ubuntu).

Reference project: https://www.contrib.andrew.cmu.edu/~somlo/BTCP/self_hosting_fedora.html

enjoy-digital / litex

[Question] Ubuntu installation on LiteX SoC? #1714