OpenXiangShan / GEM5

BSD 3-Clause "New" or "Revised" License
58 stars 22 forks source link

gem5.opt 执行 checkpoint 时 Difftest failed #184

Open rmxpf opened 3 hours ago

rmxpf commented 3 hours ago

环境描述:

GEM5:xs-dev f231aa596b60bb096a0892fbfc4d68dc713d000f NEMU:master cf24515c85f5be898687959ab299ea276dbd7c56 DRAMsim3:master 29817593b3389f1337235d63cac515024ab8fd6e LibCheckpointAlpha:main c5c2fef74133fb2b8ef8642633f60e0996493f29 工具链:15.0.0(/nfs/home/hebo/TOOL/gnu-riscv64-toolchain/riscv-toolchain-gcc15-240613-noFFnoSeg2)


我遇到的问题:gem5.opt 执行 checkpoint 时 Difftest failed 我想要的:我想用最新的 GEM5 跑我的 checkpoint(gcb)

现象描述

在用 gem5.opt 执行 checkpoint 时 Difftest failed ,部分日志如下所示:

**** REAL SIMULATION ****
build/RISCV/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting simulation...
build/RISCV/cpu/base.cc:1412: warn: Start memcpy to NEMU from 0x7fd92f9b8000, size=8589934592
build/RISCV/cpu/base.cc:1415: warn: Start regcpy to NEMU
build/RISCV/cpu/base.cc:1440: panic: Difftest failed!
Memory Usage: 17138044 KBytes
Program aborted at tick 61938
Aborted (core dumped)

感觉像是我的 difftest 启动失败了,我怀疑是我 riscv64-nemu-interpreter-so 或者 gcpt.bin 有问题,于是我尝试使用 GEM5/releases 里的编译好的文件:

riscv64-nemu-interpreter-4332a525-so + 我自己的 gcpt.bin ,出现同样的现象

nemu-gcbv-ref.so + gcb-restorer.bin ,出现同样的现象

riscv64-nemu-interpreter-so + gcpt.bin ,出现同样的现象

nemu-gcbv-ref.so + gcb-restorer.bin ,出现同样的现象

riscv64-nemu-interpreter-231008.so + gcpt-restorer-231016.bin ,出现如下所示的现象,但还是没跑起来:

build/RISCV/mem/physical.cc:707: warn: Overriding Gcpt restorer
build/RISCV/mem/physical.cc:708: warn: gCptRestorerPath: /nfs/home/haokangda/hebo/TOOL/GEM5/GEM5-reuslt/test/gcpt-restorer-231016.bin
build/RISCV/mem/physical.cc:723: warn: Gcpt restorer file size 4352 is larger than limit 1792, is partially loaded
build/RISCV/mem/physical.cc:731: warn: gcpt restore size: 1792
build/RISCV/sim/system.cc:561: info: Restored from Xiangshan RISC-V Checkpoint
**** REAL SIMULATION ****
build/RISCV/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting simulation...
build/RISCV/cpu/base.cc:1412: warn: Start memcpy to NEMU from 0x7fec8e83e000, size=8589934592
build/RISCV/cpu/base.cc:1415: warn: Start regcpy to NEMU
address (0x0000000010000000) is out of bound {???} [0x0000000000000000, 0x0000000000000000] at pc = 0x0000000010000000
  $0: 0x0000000000000000   ra: 0x0000000000000000   sp: 0x0000000000000000   gp: 0x0000000000000000 
  tp: 0x0000000000000000   t0: 0x0000000000000000   t1: 0x0000000000000000   t2: 0x0000000000000000 
  s0: 0x0000000000080000   s1: 0x0000000000000000   a0: 0x0000000000000000   a1: 0x0000000000000000 
  a2: 0x0000000000000000   a3: 0x0000000000000000   a4: 0x0000000000000000   a5: 0x0000000000000000 
  a6: 0x0000000000000000   a7: 0x0000000000000000   s2: 0x0000000000000000   s3: 0x0000000000000000 
  s4: 0x0000000000000000   s5: 0x0000000000000000   s6: 0x0000000000000000   s7: 0x0000000000000000 
  s8: 0x0000000000000000   s9: 0x0000000000000000  s10: 0x0000000000000000  s11: 0x0000000000000000 
  t3: 0x0000000000000000   t4: 0x0000000000000000   t5: 0x0000000000000000   t6: 0x0000000000000000 
 ft0: 0x0000000000000000  ft1: 0x0000000000000000  ft2: 0x0000000000000000  ft3: 0x0000000000000000 
 ft4: 0x0000000000000000  ft5: 0x0000000000000000  ft6: 0x0000000000000000  ft7: 0x0000000000000000 
 fs0: 0x0000000000000000  fs1: 0x0000000000000000  fa0: 0x0000000000000000  fa1: 0x0000000000000000 
 fa2: 0x0000000000000000  fa3: 0x0000000000000000  fa4: 0x0000000000000000  fa5: 0x0000000000000000 
 fa6: 0x0000000000000000  fa7: 0x0000000000000000  fs2: 0x0000000000000000  fs3: 0x0000000000000000 
 fs4: 0x0000000000000000  fs5: 0x0000000000000000  fs6: 0x0000000000000000  fs7: 0x0000000000000000 
 fs8: 0x0000000000000000  fs9: 0x0000000000000000 fs10: 0x0000000000000000 fs11: 0x0000000000000000 
 ft8: 0x0000000000000000  ft9: 0x0000000000000000 ft10: 0x0000000000000000 ft11: 0x0000000000000000 
pc: 0x0000000010000000 mstatus: 0x0000000a00000000 mcause: 0x0000000000000000 mepc: 0x0000000000000000
                       sstatus: 0x0000000200000000 scause: 0x0000000000000000 sepc: 0x0000000000000000
satp: 0x0000000000000000
mip: 0x0000000000000000 mie: 0x0000000000000000 mscratch: 0x0000000000000000 sscratch: 0x0000000000000000
mideleg: 0x0000000000000000 medeleg: 0x0000000000000000
mtval: 0x0000000000000000 stval: 0x0000000000000000 mtvec: 0x0000000000000000 stvec: 0x0000000000000000
privilege mode:2147483648  pmp: below
[src/cpu/cpu-exec.c,62,monitor_statistic] host time spent = 0 us
[src/cpu/cpu-exec.c,64,monitor_statistic] total guest instructions = 1
[src/cpu/cpu-exec.c,66,monitor_statistic] Finish running in less than 1 us and can not calculate the simulation frequency
gem5.opt: src/device/io/map.c:21: check_bound: Assertion `map != ((void *)0) && addr <= map->high && addr >= map->low' failed.
Program aborted at tick 61938
Aborted (core dumped)

到这里我开始怀疑是我 checkpoint 有问题,于是我使用了我很久以前配置的 GEM5 跑同一个 checkpoint 文件,它是可以执行的。

如何构建的 GEM5

GEM5 build

git clone https://github.com/OpenXiangShan/GEM5.git
git checkout xs-dev
git pull

cd GEM5/ext/dramsim3
git clone https://github.com/umd-memsys/DRAMsim3.git DRAMsim3
cd DRAMsim3 && mkdir build
cd build
cmake ..
make

cd GEM5
scons build/RISCV/gem5.opt --gold-linker -j8

NEMU build

# build gcpt.bin
git clone https://github.com/OpenXiangShan/NEMU.git
cd NEMU/resource
git clone https://github.com/OpenXiangShan/LibCheckpointAlpha.git gcpt_restore
cd gcpt_restore
make
# build riscv64-nemu-interpreter-so
cd NEMU
make riscv64-gem5-ref_defconfig
make -j16

完整的执行命令

/nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/build/RISCV/gem5.opt /nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/configs/example/fs.py \
--xiangshan-system --cpu-type=DerivO3CPU --mem-size=8GB --caches \
--cacheline_size=64 --l1i_size=64kB --l1i_assoc=8 --l1d_size=64kB \
--l1d_assoc=8 --l1d-hwp-type=XSCompositePrefetcher --short-stride-thres=0 \
--l2cache --l2_size=1MB --l2_assoc=8 --l3cache --l3_size=16MB --l3_assoc=16 \
--l1-to-l2-pf-hint --l2-hwp-type=WorkerPrefetcher --l2-to-l3-pf-hint \
--l3-hwp-type=WorkerPrefetcher --mem-type=DRAMsim3 \
--dramsim3-ini=/nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini \
--bp-type=DecoupledBPUWithFTB --enable-loop-predictor \
--difftest-ref-so /nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/build/riscv64-nemu-interpreter-so \
--enable-difftest \
--generic-rv-cpt=/nfs/home/haokangda/hebo/BOSC/Simpoint_Checkpoint/auto_checkpoint/archive/archive/969cfff4cce9ca1ca51d45cbf8254e8d/checkpoint-0-0-0/astar_biglakes/57/_57_0.016584_.gz \
--gcpt-restorer=/nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/resource/gcpt_restore/build/gcpt.bin \
--warmup-insts-no-switch=20000000 --maxinsts=40000000

完整的日志信息

Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
WARNING: Output directory ext/dramsim3/DRAMsim3/ not exists! Using current directory for output!
build/RISCV/arch/riscv/bare_metal/fs_workload.cc:60: info: No bootload provided, because using XS GCPT, reset to 0x80000000
build/RISCV/cpu/base.cc:214: warn: cpu_id set to 0
Using /nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/build/riscv64-nemu-interpreter-so for difftest
build/RISCV/cpu/base.cc:228: warn: Difftest is enabled with ref so: /nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/build/riscv64-nemu-interpreter-so.
build/RISCV/cpu/o3/cpu.cc:233: warn: Setting isa ptr of cpu to 0x5600b7e3cc60
build/RISCV/base/statistics.hh:281: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
0: system.remote_gdb: listening for remote gdb on port 7000
build/RISCV/mem/physical.cc:507: warn: Unserializing physical memory from file /nfs/home/haokangda/hebo/BOSC/Simpoint_Checkpoint/auto_checkpoint/archive/archive/969cfff4cce9ca1ca51d45cbf8254e8d/checkpoint-0-0-0/astar_biglakes/57/_57_0.016584_.gz
build/RISCV/mem/physical.cc:707: warn: Overriding Gcpt restorer
build/RISCV/mem/physical.cc:708: warn: gCptRestorerPath: /nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/resource/gcpt_restore/build/gcpt.bin
build/RISCV/mem/physical.cc:723: warn: Gcpt restorer file size 1048600 is larger than limit 1792, is partially loaded
build/RISCV/mem/physical.cc:731: warn: gcpt restore size: 1792
build/RISCV/sim/system.cc:561: info: Restored from Xiangshan RISC-V Checkpoint
gem5 Simulator System.  https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version [DEVELOP-FOR-22.1]
gem5 compiled Oct 12 2024 10:30:19
gem5 started Oct 12 2024 16:55:45
gem5 executing on node042.bosccluster.com, pid 1358518
command line: /nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/build/RISCV/gem5.opt /nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/configs/example/fs.py --xiangshan-system --cpu-type=DerivO3CPU --mem-size=8GB --caches --cacheline_size=64 --l1i_size=64kB --l1i_assoc=8 --l1d_size=64kB --l1d_assoc=8 --l1d-hwp-type=XSCompositePrefetcher --short-stride-thres=0 --l2cache --l2_size=1MB --l2_assoc=8 --l3cache --l3_size=16MB --l3_assoc=16 --l1-to-l2-pf-hint --l2-hwp-type=WorkerPrefetcher --l2-to-l3-pf-hint --l3-hwp-type=WorkerPrefetcher --mem-type=DRAMsim3 --dramsim3-ini=/nfs/home/haokangda/hebo/TOOL/GEM5/GEM5/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini --bp-type=DecoupledBPUWithFTB --enable-loop-predictor --difftest-ref-so /nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/build/riscv64-nemu-interpreter-so --enable-difftest --generic-rv-cpt=/nfs/home/haokangda/hebo/BOSC/Simpoint_Checkpoint/auto_checkpoint/archive/archive/969cfff4cce9ca1ca51d45cbf8254e8d/checkpoint-0-0-0/astar_biglakes/57/_57_0.016584_.gz --gcpt-restorer=/nfs/home/haokangda/hebo/TOOL/GEM5/NEMU/resource/gcpt_restore/build/gcpt.bin --warmup-insts-no-switch=20000000 --maxinsts=40000000

[<m5.params.AddrRange object at 0x7fdb2fb9ee60>]
['basic']
db_switches: []
Attach 1 decoders to thread with addr: <orphan System>.cpu.decoder
Create threads for test sys cpu (RiscvO3CPU)
Add dtb for L2 prefetcher
Finish memory system configuration
No cpu_class provided
Registering probe listeners for BaseO3CPU system.cpu
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher
system.cpu.dcache.prefetcher addTLB system.cpu.mmu.dtb
system.cpu.dcache.prefetcher addHintDownStream system.l2_caches.prefetcher
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.berti
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.bop_large
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.bop_learned
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.bop_small
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.cmc
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.ipcp
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.opt
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.spp
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.sstride
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.xsstream
Registering probe listeners for Prefetcher system.l2_caches.prefetcher
system.l2_caches.prefetcher addTLB system.cpu.mmu.dtb
system.l2_caches.prefetcher addHintDownStream system.l3.prefetcher
Registering probe listeners for Prefetcher system.l3.prefetcher
**** REAL SIMULATION ****
build/RISCV/sim/simulate.cc:194: info: Entering event queue @ 0.  Starting simulation...
build/RISCV/cpu/base.cc:1412: warn: Start memcpy to NEMU from 0x7fd92f9b8000, size=8589934592
build/RISCV/cpu/base.cc:1415: warn: Start regcpy to NEMU
build/RISCV/cpu/base.cc:1440: panic: Difftest failed!
Memory Usage: 17138044 KBytes
Program aborted at tick 61938
Aborted (core dumped)

额外的信息

这是我很久之前配置的 GEM5 跑同一个 checkpoint 文件,它是可以执行的。

执行命令:

/nfs/home/hebo/TOOL/GEM5/GEM5-internal/build/RISCV/gem5.opt /nfs/home/hebo/TOOL/GEM5/GEM5-internal/configs/example/fs.py \
--xiangshan-system --cpu-type=DerivO3CPU --mem-size=8GB --caches \
--cacheline_size=64 --l1i_size=64kB --l1i_assoc=8 --l1d_size=64kB \
--l1d_assoc=8 --l1d-hwp-type=XSCompositePrefetcher --short-stride-thres=0 \
--l2cache --l2_size=1MB --l2_assoc=8 --l3cache --l3_size=16MB --l3_assoc=16 \
--l1-to-l2-pf-hint --l2-hwp-type=WorkerPrefetcher --l2-to-l3-pf-hint \
--l3-hwp-type=WorkerPrefetcher --mem-type=DRAMsim3 \
--dramsim3-ini=/nfs/home/hebo/TOOL/GEM5/GEM5-internal/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini \
--bp-type=DecoupledBPUWithFTB --enable-loop-predictor \
--difftest-ref-so /nfs/home/hebo/TOOL/GEM5/NEMU-test/732e4ccd/NEMU/build/riscv64-nemu-interpreter-so \
--enable-difftest \
--generic-rv-cpt=/nfs/home/haokangda/hebo/BOSC/Simpoint_Checkpoint/auto_checkpoint/archive/archive/969cfff4cce9ca1ca51d45cbf8254e8d/checkpoint-0-0-0/astar_biglakes/57/_57_0.016584_.gz \
--gcpt-restorer=/nfs/home/hebo/TOOL/GEM5/NEMU-test/732e4ccd/NEMU/resource/gcpt_restore/build/gcpt.bin \
--warmup-insts-no-switch=20000000 --maxinsts=40000000

日志文件路径:/nfs/home/hebo/TOOL/GEM5/test/log.txt

shinezyy commented 3 hours ago

Is it related to RV-H @jueshiwenli ?

jueshiwenli commented 3 hours ago

can you show me the nemu config?

shinezyy commented 3 hours ago

can you show me the nemu config?

He says he is using riscv64-gem5-ref_defconfig of cf24515c85f5be898687959ab299ea276dbd7c56

jueshiwenli commented 3 hours ago

can you show me the nemu config?

He says he is using riscv64-gem5-ref_defconfig of cf24515c85f5be898687959ab299ea276dbd7c56

the config need update, open RVH, i will update it in nemu later