OpenXiangShan / XiangShan-doc

Documentation for XiangShan
https://xiangshan-doc.readthedocs.io
Creative Commons Attribution 4.0 International
329 stars 126 forks source link

仿真环境下,香山核运行Linux镜像遇到无法引导的错误 #64

Closed LeslieZhu closed 4 months ago

LeslieZhu commented 9 months ago

根据文档:

https://xiangshan-doc.readthedocs.io/zh_CN/latest/tools/linux-kernel-for-xs/#4-debian

一步一步按照文档进行,制作了一个debian镜像,这个镜像是可以通过NEMU成功单独运行的,但当换成基于香山核来运行就出错了。

基本是这样处理的:

bootargs = "root=/dev/mmcblk0p1 rootfstype=ext4 ro rootwait earlycon";

执行的时候报错找不到linux系统引导区:

....
[    0.000000] Kernel command line: root=/dev/mmcblk0p1 rootfstype=ext4 ro rootwait earlycon
....
[    0.440000] Key type dns_resolver registered
now = 51298s
[    0.450000] Waiting for root device /dev/mmcblk0p1...
[    0.470000] sdhost-nemu 40002000.mmc: no support for card's volts
[    0.470000] mmc0: error -22 whilst initialising SDIO card
[    0.500000] mmcblk0: mmc0:0001  4.00 GiB
[    0.540000] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0x0
[    0.630000] VFS: Cannot open root device "mmcblk0p1" or unknown-block(179,1): error -6
[    0.630000] Please append a correct "root=" boot option; here are the available partitions:
[    0.630000] b300         4194304 mmcblk0
[    0.630000]  driver: mmcblk
[    0.630000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,1)
[    0.630000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-14486-g655055af981b #11

查了一些资料后依旧没有解决,镜像文件的分区信息如下:

label: gpt
label-id: 63C4D56A-31D3-9447-AD8E-64BBD721E3F0
device: debian.img
unit: sectors
first-lba: 2048
last-lba: 33554398
sector-size: 512

debian.img1 : start=        2048, size=    25165824, type=4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709, uuid=B97A0997-E707-744D-BCD3-11933AC4D278
debian.img2 : start=    25167872, size=     8386527, type=0657FD6D-A4AB-43C4-84E5-0933C84B4F4F, uuid=E2C381D4-F885-FB47-8A8B-C1FB4D2CBBF1

完整日志文件: debian13-v3.log

同一个debian镜像,使用NEMU单独运行顺利,换成基于香山核后就无法顺利运行,是什么原因呢?

poemonsense commented 9 months ago

我们看一下这个问题。方便提供一下香山对应的版本吗?

poemonsense commented 9 months ago

linux-4.18-debian.tar.gz

是否方便用我上传的这个压缩包里面的bbl.bin作为内存镜像,配合您之前的sdcard镜像,来进行仿真?怀疑是不是linux驱动对于nemu和香山环境不一样

LeslieZhu commented 9 months ago

感谢 @poemonsense 的回复,这是我的版本信息:

XiangShan$ git branch
* (HEAD detached at cb6e5d3cb)
  master

NEMU$ git branch
* (HEAD detached at 3f9a7e1a)
  master

riscv-pk$ git branch
* noop

riscv-linux$ git branch
* nanshan

同时XIangShan代码仓库有一点点变动,如下:

XiangShan$ git diff

diff --git a/.mill-version b/.mill-version
index af88ba824..159b5c60d 100644
--- a/.mill-version
+++ b/.mill-version
@@ -1 +1 @@
-0.11.1
+0.11.0-30-e5dea9
diff --git a/build.sc b/build.sc
index 15494d537..00c16dcf3 100644
--- a/build.sc
+++ b/build.sc
@@ -158,7 +158,7 @@ trait CommonXiangShan extends XSModule with SbtModule { m =>

   override def millSourcePath = os.pwd

-  override def forkArgs = Seq("-Xmx64G", "-Xss256m")
+  override def forkArgs = Seq("-Xmx20G", "-Xss256m")

   override def ivyDeps = super.ivyDeps() ++ Seq(ivys.chiseltest)

diff --git a/difftest b/difftest
--- a/difftest
+++ b/difftest
@@ -1 +1 @@
-Subproject commit 6107002bb441bdd6fd1ac18a497fe89f257a987b
+Subproject commit 6107002bb441bdd6fd1ac18a497fe89f257a987b-dirty
poemonsense commented 9 months ago

是否方便更新到最新版的香山试一下,看看能不能跑?最近master分支有一些bug修复

不过我更怀疑是linux kernel驱动的问题,可以试试看我上传的那个bbl.bin,那个是旧版本我们本地一直在用的一个kernel。NEMU最新的sdcard有可能已经和香山不一样了(这个我们去确认一下)

LeslieZhu commented 9 months ago

linux-4.18-debian.tar.gz

是否方便用我上传的这个压缩包里面的bbl.bin作为内存镜像,配合您之前的sdcard镜像,来进行仿真?怀疑是不是linux驱动对于nemu和香山环境不一样

好的,我将使用您的文件来进行测试

LeslieZhu commented 9 months ago

linux-4.18-debian.tar.gz 是否方便用我上传的这个压缩包里面的bbl.bin作为内存镜像,配合您之前的sdcard镜像,来进行仿真?怀疑是不是linux驱动对于nemu和香山环境不一样

好的,我将使用您的文件来进行测试

在我自己的sdcard镜像基础上,使用这个linux-4.8的bbl.bin运行,也无法运行成功,报错信息如下:

[    0.000000] Kernel command line: root=/dev/mmcblk0 rootfstype=ext4 ro rootwait earlycon
....
[    0.470000] Waiting for root device /dev/mmcblk0...
[    0.490000] sdhost-nemu 40002000.mmc: no support for card's volts
[    0.490000] mmc0: error -22 whilst initialising SDIO card
[    0.520000] mmc0: new MMC card at address 0001
[    0.520000] mmcblk0: mmc0:0001  4.00 GiB
[    0.560000] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0x0
[    0.650000] List of all partitions:
[    0.650000] b300         4194304 mmcblk0
[    0.650000]  driver: mmcblk
[    0.650000] No filesystem could mount root, tried:
[    0.650000]  ext4
[    0.650000]
[    0.650000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,0)

@poemonsense 您提到怀疑是Linux kernel驱动的问题,这个我要怎么来修复呢?是sdcard镜像里面需要安装这个驱动吗?

poemonsense commented 9 months ago

@poemonsense 您提到怀疑是Linux kernel驱动的问题,这个我要怎么来修复呢?是sdcard镜像里面需要安装这个驱动吗?

从替换了bbl.bin之后的log看,已经能读取到sd卡的内容了,但是因为我们之前默认的sdcard是没有分partition的,只有一个大的。

要不您先尝试一下把sdcard的两个partition去掉?看起来我们之前编译好的kernel选项是root=/dev/mmcblk0

如果修改之后没问题的话,可以尝试将您那边linux用的驱动替换成这个版本:https://github.com/OpenXiangShan/NEMU/blob/ac6af7db1a3cdf9208c81e03bdf1aade8dff7640/resource/sdcard/nemu.c

NEMU的 https://github.com/OpenXiangShan/NEMU/commit/3bcf376f330b2446ac72ef4654f234d68dc6b8b5 提交更新了支持可写的sdcard驱动(NEMU里面的设备同步更新了),但是香山这边硬件仿真里面的sdcard模型还没有支持写,可能是因为两边不匹配,导致出问题了。

LeslieZhu commented 9 months ago

要不您先尝试一下把sdcard的两个partition去掉?看起来我们之前编译好的kernel选项是root=/dev/mmcblk0

如果修改之后没问题的话,可以尝试将您那边linux用的驱动替换成这个版本:https://github.com/OpenXiangShan/NEMU/blob/ac6af7db1a3cdf9208c81e03bdf1aade8dff7640/resource/sdcard/nemu.c

NEMU的 OpenXiangShan/NEMU@3bcf376 提交更新了支持可写的sdcard驱动(NEMU里面的设备同步更新了),但是香山这边硬件仿真里面的sdcard模型还没有支持写,可能是因为两边不匹配,导致出问题了。

好的,我先试试不分区的方案

LeslieZhu commented 9 months ago
label: gpt
label-id: 6AACB574-2879-8045-9432-1B71FD430E6A
device: debianV2.img
unit: sectors
first-lba: 2048
last-lba: 33554398
sector-size: 512

debianV2.img1 : start=        2048, size=    33552351, type=4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709, uuid=FA682E13-4418-494E-9A45-8A0FB46C343A
......
[    0.000000] Kernel command line: root=/dev/mmcblk0 rootfstype=ext4 ro rootwait earlycon
......
[    0.470000] Waiting for root device /dev/mmcblk0...
[    0.490000] sdhost-nemu 40002000.mmc: no support for card's volts
[    0.490000] mmc0: error -22 whilst initialising SDIO card
[    0.520000] mmc0: new MMC card at address 0001
[    0.520000] mmcblk0: mmc0:0001  4.00 GiB
[    0.560000] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0x0
[    0.650000] List of all partitions:
[    0.650000] b300         4194304 mmcblk0
[    0.650000]  driver: mmcblk
[    0.650000] No filesystem could mount root, tried:
[    0.650000]  ext4
[    0.650000]
[    0.650000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,0)
[    0.650000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-00048-g5370c2bddcda-dirty #148

看起来问题的根源不在分区,我下一步将替换NEMU中的linux sdhost驱动,尝试是否解决这个问题,有结果了会再留言 @poemonsense

poemonsense commented 9 months ago

这个分区里面有debian镜像吗,文件系统完好嘛? @wakafa1 我们方便找到我们这边能正确跑的log吗

wakafa1 commented 9 months ago

作为参考,我们最近在香山上正确跑的 log 如下:

[    0.000000] Kernel command line: root=/dev/mmcblk0 rootfstype=ext4 ro rootwait earlycon
······
[    0.190000] sdhost-nemu 40002000.mmc: unable to initialise DMA channel. Falling back to PIO
[    0.230000] sdhost-nemu 40002000.mmc: loaded - DMA disabled
······
[    0.230000] Waiting for root device /dev/mmcblk0...
[    0.260000] sdhost-nemu 40002000.mmc: no support for card's volts
[    0.260000] mmc0: error -22 whilst initialising SDIO card
[    0.290000] mmc0: new MMC card at address 0001
[    0.290000] mmcblk0: mmc0:0001  4.00 GiB
[    0.320000] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0x0
[    0.450000] EXT4-fs (mmcblk0): mounted filesystem with ordered data mode. Opts: (null)
[    0.450000] VFS: Mounted root (ext4 filesystem) readonly on device 179:0.
[    0.450000] Freeing unused kernel memory: 116K
[    0.450000] This architecture does not have kernel memory protection.
[    3.480000] systemd[1]: System time before build time, advancing clock.
[    3.660000] systemd[1]: systemd 246.6-4 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    3.660000] systemd[1]: Detected architecture riscv64.
Welcome to Debian GNU/Linux bullseye/sid!