firesim / FireMarshal

Software workload management tool for RISC-V based SoC research. This is the default workload management tool for Chipyard and FireSim.
https://docs.fires.im/en/latest/Advanced-Usage/Workloads/index.html
Other
76 stars 51 forks source link

Does br-base for Chipyard prototype wait for mouse input after `random: crng init done`? #208

Closed michael-etzkorn closed 3 years ago

michael-etzkorn commented 3 years ago

Kind of cross-posting my issue from https://github.com/ucb-bar/chipyard/issues/957, but since I'm still figuring out how to put files into the hfs part of the partition, I figured I'd check someone saying that "At boot, the kernel waits for mouse movements to initialize the random number generator." Maybe that's why after random: crng init done is printed the uart output hangs?

michael-etzkorn commented 3 years ago

I'm not sure if that's what's causing the stall, but I'm also wondering if I need to remove the images from the images folder to recompile the kernel after changing the config (removed the mouse and hoping that works)

NathanTP commented 3 years ago

The random part is normal, I doubt that's your issue. I would first make sure your image boots in Qemu and Spike (using the marshal launch command). If that doesn't work there's something wrong with Marshal (of course, cleaning and rebuilding is always worth a try, you never know. You may even "make mrproper" in boards/firechip/linux to be extra sure).

If it does work, then it's probably something to do with how you're using the VCU118. I notice from the output logs on your other post that it hangs after /init which means Linux itself probably booted OK. You can see what /init does in wlutil/initramfs/nodisk/init. I've never seen the stuff about mmc so it could be a Linux configuration issue (maybe it doesn't have the right drivers for your board?). One trick I've done for this stuff is to call "/bin/busybox sh" at the very beginning of the init script and mess around manually to see what's going on.

michael-etzkorn commented 3 years ago

Vivado is generating a bitstream and eating a large amount of RAM so I'll have to try the QEMU thing after it finishes. I'll have to look at the Linux config and see what changes the VCU118 needs (I removed the mouse but I don't think that's gonna help with the SD card). I can't rule out these SD cards as being faulty either, I'm not sure where we got them from, but it seems to read them fine at first. I'll report back once I've tried QEMU and /bin/busybox sh

michael-etzkorn commented 3 years ago

I can't seem to get qemu to run the no-disk version when running ./marshal launch br-base.json I get /mnt/Vivado_part/chipyard/chipyard/software/firemarshal/images/br-base-bin: No such file or directory Maybe I need to do something with the j flag for the launch command. I also wanted to try removing SPI_MEM and SPI_MMC stuff from the configuration in the kernel to see if that would fix my problem.

michael-etzkorn commented 3 years ago

It's interesting that echo "Running FireMarshal nodisk init" never occurs. There's nothing in the second part of the partition. ~https://mtekk.us/archives/guides/fix-linux-boot-halting-on-run-init-as-init-process/ suggests adding CONFIG_DEVTMPFS_MOUNT=y~ This is already added to the config. marshal-config.yaml is board-dir : 'boards/prototype' ~I'm assuming I would add this config to the config file under boards/prototype/~

michael-etzkorn commented 3 years ago

QEMU launches ok. The output doesn't look like the board's though. This is the disk version.


MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0    : 0x0000000080000000-0x000000008001ffff (A)
PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] Forcing kernel command line to: console=hvc0 earlycon=sbi
[    0.000000] Linux version 5.7.0-rc3-58539-g5f5fd87b36e2 (_eda@CAD-FPGA-Dal-1) (gcc version 9.2.0 (GCC), GNU ld (GNU Binutils) 2.32) #3 SMP Thu Aug 26 10:10:47 EDT 2021
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] initrd not found or empty - disabling initrd
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000047fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x000000047fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000047fffffff]
[    0.000000] software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x8
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] SBI v0.2 HSM extension detected
[    0.000000] elf_hwcap is 0x112d
[    0.000000] percpu: Embedded 17 pages/cpu s31976 r8192 d29464 u69632
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 4136455
[    0.000000] Kernel command line: console=hvc0 earlycon=sbi
[    0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[    0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 16437756K/16775168K available (6544K kernel code, 4152K rwdata, 4096K rodata, 1115K init, 318K bss, 337412K reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]       fixmap : 0xffffffcefee00000 - 0xffffffceff000000   (2048 kB)
[    0.000000]       pci io : 0xffffffceff000000 - 0xffffffcf00000000   (  16 MB)
[    0.000000]      vmemmap : 0xffffffcf00000000 - 0xffffffcfffffffff   (4095 MB)
[    0.000000]      vmalloc : 0xffffffd000000000 - 0xffffffdfffffffff   (65535 MB)
[    0.000000]       lowmem : 0xffffffe000000000 - 0xffffffe3ffe00000   (16382 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[    0.000000] rcu:     RCU debug extended QS entry/exit.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
[    0.000000] plic: mapped 53 interrupts with 4 handlers for 8 contexts.
[    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[    0.000138] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns
[    0.003107] Console: colour dummy device 80x25
[    0.003605] printk: console [hvc0] enabled
[    0.003605] printk: console [hvc0] enabled
[    0.004135] printk: bootconsole [sbi0] disabled
[    0.004135] printk: bootconsole [sbi0] disabled
[    0.007389] Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=40000)
[    0.007931] pid_max: default: 32768 minimum: 301
[    0.009435] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.009907] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.033769] rcu: Hierarchical SRCU implementation.
[    0.037689] smp: Bringing up secondary CPUs ...
[    0.045381] smp: Brought up 1 node, 4 CPUs
[    0.057141] devtmpfs: initialized
[    0.062284] random: get_random_u32 called from bucket_table_alloc.isra.0+0x4a/0xbe with crng_init=0
[    0.064578] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.065837] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.071642] NET: Registered protocol family 16
[    0.115646] vgaarb: loaded
[    0.116995] SCSI subsystem initialized
[    0.119509] usbcore: registered new interface driver usbfs
[    0.120113] usbcore: registered new interface driver hub
[    0.120688] usbcore: registered new device driver usb
[    0.129600] clocksource: Switched to clocksource riscv_clocksource
[    0.149229] NET: Registered protocol family 2
[    0.153677] tcp_listen_portaddr_hash hash table entries: 8192 (order: 6, 327680 bytes, linear)
[    0.154514] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.155992] TCP bind hash table entries: 65536 (order: 9, 2097152 bytes, linear)
[    0.159504] TCP: Hash tables configured (established 131072 bind 65536)
[    0.160885] UDP hash table entries: 8192 (order: 7, 786432 bytes, linear)
[    0.162496] UDP-Lite hash table entries: 8192 (order: 7, 786432 bytes, linear)
[    0.164915] NET: Registered protocol family 1
[    0.168331] RPC: Registered named UNIX socket transport module.
[    0.168628] RPC: Registered udp transport module.
[    0.168831] RPC: Registered tcp transport module.
[    0.169020] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.169556] PCI: CLS 0 bytes, default 64
[    0.219028] workingset: timestamp_bits=62 max_order=22 bucket_order=0
[    0.232674] NFS: Registering the id_resolver key type
[    0.233788] Key type id_resolver registered
[    0.234007] Key type id_legacy registered
[    0.234320] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.235463] 9p: Installing v9fs 9p2000 file system support
[    0.236990] NET: Registered protocol family 38
[    0.237578] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    0.238158] io scheduler mq-deadline registered
[    0.238456] io scheduler kyber registered
[    0.249166] pci-host-generic 30000000.pci: host bridge /soc/pci@30000000 ranges:
[    0.250229] pci-host-generic 30000000.pci:       IO 0x0003000000..0x000300ffff -> 0x0000000000
[    0.250936] pci-host-generic 30000000.pci:      MEM 0x0040000000..0x007fffffff -> 0x0040000000
[    0.252961] pci-host-generic 30000000.pci: ECAM at [mem 0x30000000-0x3fffffff] for [bus 00-ff]
[    0.254243] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00
[    0.254698] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.254981] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.255361] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff]
[    0.256473] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
[    0.377287] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.384840] 10000000.uart: ttyS0 at MMIO 0x10000000 (irq = 2, base_baud = 230400) is a 16550A
[    0.391480] [drm] radeon kernel modesetting enabled.
[    0.392133] random: fast init done
[    0.393244] random: crng init done
[    0.413519] loop: module loaded
[    0.421159] virtio_blk virtio2: [vda] 554072 512-byte logical blocks (284 MB/271 MiB)
[    0.421880] vda: detected capacity change from 0 to 283684864
[    0.440704] libphy: Fixed MDIO Bus: probed
[    0.445846] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.446195] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.446681] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.446969] ehci-pci: EHCI PCI platform driver
[    0.447331] ehci-platform: EHCI generic platform driver
[    0.447661] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.447967] ohci-pci: OHCI PCI platform driver
[    0.448326] ohci-platform: OHCI generic platform driver
[    0.449938] usbcore: registered new interface driver uas
[    0.450415] usbcore: registered new interface driver usb-storage
[    0.451454] mousedev: PS/2 mouse device common for all mice
[    0.454185] goldfish_rtc 101000.rtc: registered as rtc0
[    0.455150] goldfish_rtc 101000.rtc: setting system clock to 2021-08-26T14:54:51 UTC (1629989691)
[    0.457674] syscon-poweroff poweroff: pm_power_off already claimed 000000002d5f7738 sbi_power_off
[    0.458248] syscon-poweroff: probe of poweroff failed with error -16
[    0.459424] usbcore: registered new interface driver usbhid
[    0.459673] usbhid: USB HID core driver
[    0.461717] NET: Registered protocol family 10
[    0.467901] Segment Routing with IPv6
[    0.468449] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.470965] NET: Registered protocol family 17
[    0.472396] 9pnet: Installing 9P2000 support
[    0.472877] Key type dns_resolver registered
[    0.501507] Freeing unused kernel memory: 1112K
[    0.510767] Run /init as init process
Mounting /dev/vda as root device
[    0.724196] EXT4-fs (vda): mounted filesystem without journal. Opts: (null)
Loaded platform drivers, booting from disk:
[    1.002868] EXT4-fs (vda): re-mounted. Opts: (null)
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting mdev... OK
Initializing random number generator... done.
Starting network: OK
launching firesim workload run/command
firesim workload run/command done

The mounting filesystem part makes me think I need to put something in the second partition.

Also I can't login. I tried root, no password and root, firesim. UPDATE: after digging around in config, I'm guessing the password is fpga.

NathanTP commented 3 years ago

You need to pass -d to launch to use the no-disk version, otherwise marshal tries to find the disk-based version. You should use "marshal -d launch br-base.json". This is also why you aren't seeing the modified init, as the name suggests, the one I linked is only for no-disk builds. Also, marshal doesn't expect users to change that so it doesn't track changes to it very well. You should delete the binary (e.g. br-base-bin or your-workloadname-bin) in images/ first to make sure it gets rebuilt with your changes to init.

The password should be 'firesim'. I recommend going through the quick start on the firemarshal docs first on a clean repo to make sure everything works, that should tell you everything you need. The only reason I can think of for the password changing is that you messed with br-base somehow. I don't recommend changing br-base directly, you can create a workload for your work and modify that while inheriting from br-base. See the marshal docs for more info there, especially the linux section.

As for the VCU118 stuff, I imagine that it has to do with requiring the drivers or perhaps Linux trying to mount things it shouldn't be. I wouldn't expect the linux boot to look the same since it's a different platform. Going through the boot process manually using the init trick I mentioned should help narrow it down. FireMarshal is really only tested on firechip on firesim or qemu/spike. Once you get this all working, I'd be interested in integrating it into marshal more properly.

michael-etzkorn commented 3 years ago

I did see that under boards/prototype/distros/base-workloads/br-base/buildroot-config the config sets the password to fpga. I'm not quite sure how to do the init trick your mentioning. wlutil/initramfs/nodisk/init has those echo commands (which don't show up) and I tried making the first line of that script exec /bin/busybox sh The boot still hangs. Although now I also don't see crng init done. I also removed the mmc and SPI configs which removed the SD card warnings, but it still hangs (and now doesn't show crng init done) so probably didn't do anything good.

When I launch the nodisk version in qemu, I get this output sh: can't access tty; job control turned off This has exec /bin/busybox sh at the top of init. Not sure what to do from the shell in QEMU. When I try running sudo fdisk -l, I get a segmentation fault. I'm going to remove the exec /bin/busybox sh and see if QEMU is also hanging.

NathanTP commented 3 years ago

Ah, I didn't notice you were using the prototype board. I'm less familiar with that one. @abejgonzalez might have some additional insights here.

abejgonzalez commented 3 years ago

Yea. I'm commenting on the other thread mainly. I'm not sure what the issue is exactly.

michael-etzkorn commented 3 years ago

The issue has something to do with CVA6. I'll try to set up verbose kernel statements to help diagnose the problem when I get the chance.