OpenXiangShan / XiangShan-doc

Documentation for XiangShan
https://xiangshan-doc.readthedocs.io
Creative Commons Attribution 4.0 International
338 stars 130 forks source link

checkpointing运行后,文件夹为空,加了dont-skip-boot以后,checkpoint遇到crash #45

Closed DimanChauncey closed 1 year ago

DimanChauncey commented 1 year ago

我的负载很简单,是hello world + 累加累减 + 一个卷积计算的简单计算。我使用的interval是10w,我的分支是tracing分支 profiling使用命令: ./build/riscv64-nemu-interpreter /root/riscv-pk/build/bbl.bin -D /home/zxk/spec_cpt -w workloadName2 -C profiling -b --simpoint-profile --cpt-interval 100000 -r ./resource /gcpt_restore/build/gcpt.bin 生成了bbv.gz,但是文件大小只有20B

cluster使用命令:./resource/simpoint/simpoint_repo/bin/simpoint -loadFVFile /home/zxk/spec_cpt/profiling/workloadName2/simpoint_bbv.gz -saveSimpoints $CLUSTER/simpoints0 -saveSimpointWeigh ts $CLUSTER/weights0 -inputVectorsGzipped -maxK 30 -numInitSeeds 2 -iters 1000 -seedkm 123456 -seedproj 654321 只聚了一个类

checkpointing使用命令: ./build/riscv64-nemu-interpreter $RISCV_PK_HOME/build/bbl.bin \ -D /home/zxk/spec_cpt -w workloadName2 -C take_cpt \ -b -S /home/zxk/spec_cpt/cluster--cpt-interval 100000 \ -r ./resource/gcpt_restore/build/gcpt.bin 未生成文件。

当我在profiling阶段加了--dont-skip-boot以后,bbv文件比较大,聚类聚了7个类,但是checkpointing阶段遇到了crash,信息如下:

[src/checkpoint/serializer.cpp:209,init] Simpoint 0: @ 18, weight: 0.048780
[src/checkpoint/serializer.cpp:209,init] Simpoint 1: @ 5, weight: 0.048780
[src/checkpoint/serializer.cpp:209,init] Simpoint 2: @ 3, weight: 0.048780
[src/checkpoint/serializer.cpp:209,init] Simpoint 3: @ 9, weight: 0.121951
[src/checkpoint/serializer.cpp:209,init] Simpoint 4: @ 0, weight: 0.048780
[src/checkpoint/serializer.cpp:209,init] Simpoint 5: @ 27, weight: 0.024390
[src/checkpoint/serializer.cpp:209,init] Simpoint 6: @ 17, weight: 0.024390
[src/checkpoint/serializer.cpp:209,init] Simpoint 7: @ 12, weight: 0.121951
[src/checkpoint/serializer.cpp:209,init] Simpoint 8: @ 38, weight: 0.195122
[src/checkpoint/serializer.cpp:209,init] Simpoint 9: @ 20, weight: 0.097561
[src/checkpoint/serializer.cpp:209,init] Simpoint 10: @ 26, weight: 0.024390
[src/checkpoint/serializer.cpp:209,init] Simpoint 11: @ 36, weight: 0.097561
[src/checkpoint/serializer.cpp:209,init] Simpoint 12: @ 21, weight: 0.097561
[src/memory/paddr.c:81,init_mem] mmap memory to anonymous file
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'clint' at [0x0000000038000000, 0x000000003800ffff]
[src/isa/riscv64/init.c:70,init_isa] NEMU will start from pc 0x80000000
[src/monitor/image_loader.c:56,load_img] Loading Gcpt restorer form cmdline: /root/NEMU/resource/gcpt_restore/build/gcpt.bin

[src/monitor/image_loader.c:83,load_img] Warning: size is larger than img_size(upper limit), please check if code is missing. size:1100 img_size:f00
[src/monitor/image_loader.c:88,load_img] Fread from file because less than 512MB

[src/monitor/image_loader.c:120,load_img] Read 3840 bytes from file /root/NEMU/resource/gcpt_restore/build/gcpt.bin to 0x80000000
[src/monitor/image_loader.c:56,load_img] Loading image (bbl/bare metal app) from cmdline: ./resource/gcpt_restore/build/gcpt.bin

[src/monitor/image_loader.c:88,load_img] Fread from file because less than 512MB

[src/monitor/image_loader.c:120,load_img] Read 4352 bytes from file ./resource/gcpt_restore/build/gcpt.bin to 0x800a0000
[src/device/io/port-io.c:15,add_pio_map] Add port-io map 'uartlite' at [0x00000000000003f8, 0x0000000000000404]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'uartlite' at [0x0000000040600000, 0x000000004060000c]
[src/device/io/port-io.c:15,add_pio_map] Add port-io map 'rtc' at [0x0000000000000048, 0x000000000000004f]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'rtc' at [0x00000000a1000048, 0x00000000a100004f]
[src/device/io/port-io.c:15,add_pio_map] Add port-io map 'screen' at [0x0000000000000100, 0x0000000000000107]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'screen' at [0x0000000040001000, 0x0000000040001007]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'vmem' at [0x0000000050000000, 0x00000000500752ff]
[src/device/io/port-io.c:15,add_pio_map] Add port-io map 'keyboard' at [0x0000000000000060, 0x0000000000000063]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'keyboard' at [0x00000000a1000060, 0x00000000a1000063]
[src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'sdhci' at [0x0000000040002000, 0x000000004000207f]
[src/device/sdcard.c:121,init_sdcard] Can not find sdcard image: 
[src/monitor/monitor.c:37,welcome] Debug: OFF
[src/monitor/monitor.c:42,welcome] Build time: 17:31:30, Jun 24 2023
Welcome to riscv64-NEMU!
For help, type "help"
(nemu) c
Invalid inst 0x0000: pc = 0x0000000080140000
  $0: 0x0000000000000000   ra: 0x0000000000000000   sp: 0x0000000000000000   gp: 0x0000000000000000 
  tp: 0x0000000000000000   t0: 0x0000000000000000   t1: 0x0000000000000000   t2: 0x000000000000beef 
  s0: 0x0000000080000f00   s1: 0x0000000000000000   a0: 0x0000000000000000   a1: 0x0000000000000000 
  a2: 0x0000000000000000   a3: 0x0000000000000000   a4: 0x0000000000000000   a5: 0x0000000000000000 
  a6: 0x0000000000000000   a7: 0x0000000000000000   s2: 0x0000000000000000   s3: 0x0000000000000000 
  s4: 0x0000000000000000   s5: 0x0000000000000000   s6: 0x0000000000000000   s7: 0x0000000000000000 
  s8: 0x0000000000000000   s9: 0x0000000000000000  s10: 0x0000000000000000  s11: 0x0000000000000000 
  t3: 0x0000000000000000   t4: 0x0000000000000000   t5: 0x0000000000000000   t6: 0x0000000000000000 
 ft0: 0x0000000000000000  ft1: 0x0000000000000000  ft2: 0x0000000000000000  ft3: 0x0000000000000000 
 ft4: 0x0000000000000000  ft5: 0x0000000000000000  ft6: 0x0000000000000000  ft7: 0x0000000000000000 
 fs0: 0x0000000000000000  fs1: 0x0000000000000000  fa0: 0x0000000000000000  fa1: 0x0000000000000000 
 fa2: 0x0000000000000000  fa3: 0x0000000000000000  fa4: 0x0000000000000000  fa5: 0x0000000000000000 
 fa6: 0x0000000000000000  fa7: 0x0000000000000000  fs2: 0x0000000000000000  fs3: 0x0000000000000000 
 fs4: 0x0000000000000000  fs5: 0x0000000000000000  fs6: 0x0000000000000000  fs7: 0x0000000000000000 
 fs8: 0x0000000000000000  fs9: 0x0000000000000000 fs10: 0x0000000000000000 fs11: 0x0000000000000000 
 ft8: 0x0000000000000000  ft9: 0x0000000000000000 ft10: 0x0000000000000000 ft11: 0x0000000000000000 
pc: 0x0000000080000000 mstatus: 0x0000000a00000000 mcause: 0x0000000000000000 mepc: 0x0000000000000000
                       sstatus: 0x0000000200000000 scause: 0x0000000000000000 sepc: 0x0000000000000000
satp: 0x0000000000000000
mip: 0x0000000000000000 mie: 0x0000000000000000 mscratch: 0x0000000000000000 sscratch: 0x0000000000000000
mideleg: 0x0000000000000000 medeleg: 0x0000000000000000
mtval: 0x0000000000000000 stval: 0x0000000000000000 mtvec: 0x0000000000000000 stvec: 0x0000000000000000
privilege mode:3  pmp: below
 0: cfg:0x00 addr:0x0000000000000000| 1: cfg:0x00 addr:0x0000000000000000
 2: cfg:0x00 addr:0x0000000000000000| 3: cfg:0x00 addr:0x0000000000000000
 4: cfg:0x00 addr:0x0000000000000000| 5: cfg:0x00 addr:0x0000000000000000
 6: cfg:0x00 addr:0x0000000000000000| 7: cfg:0x00 addr:0x0000000000000000
 8: cfg:0x00 addr:0x0000000000000000| 9: cfg:0x00 addr:0x0000000000000000
10: cfg:0x00 addr:0x0000000000000000|11: cfg:0x00 addr:0x0000000000000000
12: cfg:0x00 addr:0x0000000000000000|13: cfg:0x00 addr:0x0000000000000000
14: cfg:0x00 addr:0x0000000000000000|15: cfg:0x00 addr:0x0000000000000000
pmp csr rw: enable, pmp check: disable
[src/cpu/cpu-exec.c:76,monitor_statistic] host time spent = 0 us
[src/cpu/cpu-exec.c:78,monitor_statistic] total guest instructions = 18
[src/cpu/cpu-exec.c:80,monitor_statistic] Finish running in less than 1 us and can not calculate the simulation frequency
riscv64-nemu-interpreter: src/isa/riscv64/instr/rvc/decode.h:132: decode_C_ADDI4SPN: Assertion `0' failed.
Aborted (core dumped)
shinezyy commented 1 year ago
  1. 能否给一下“生成了bbv.gz,但是文件大小只有20B”的log?
  2. “hello world + 累加累减 + 一个卷积计算的简单计算”这个任务预计的指令数量级是多少?
  3. --dont-skip-boot 会把BBL 和 Linux启动包含进去,这部分的功能的支持十分有限,可能引起一些问题。但是log里报告的错误似乎不是我们预期内的问题。请问是否可以用 https://github.com/OpenXiangShan/ready-to-run/blob/master/linux-0xa0000.bin 配合--dont-skip-boot 重新进行一次profiling 和 take cpt,并提供log?
DimanChauncey commented 1 year ago

感谢回答!

  1. 日志如下:
    
    [src/monitor/monitor.c:155,parse_args] Doing Simpoint Profiling
    [src/checkpoint/path_manager.cpp:42,init] Cpt id: -1
    [src/checkpoint/path_manager.cpp:67,setOutputDir] Created /home/zxk/spec_cpt/profiling/workloadName2/

[src/checkpoint/simpoint.cpp:81,init] Doing simpoint profiling with interval 100000 [src/memory/paddr.c:81,init_mem] mmap memory to anonymous file [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'clint' at [0x0000000038000000, 0x000000003800ffff] [src/isa/riscv64/init.c:70,init_isa] NEMU will start from pc 0x80000000 [src/monitor/monitor.c:309,init_monitor] You are providing a gcpt restorer when doing simpoing profiling, If you didn't link the program correctly, this will corrupt your memory/program. [src/monitor/image_loader.c:56,load_img] Loading Gcpt restorer form cmdline: ./resource/gcpt_restore/build/gcpt.bin

[src/monitor/image_loader.c:83,load_img] Warning: size is larger than img_size(upper limit), please check if code is missing. size:1100 img_size:f00 [src/monitor/image_loader.c:88,load_img] Fread from file because less than 512MB

[src/monitor/image_loader.c:120,load_img] Read 3840 bytes from file ./resource/gcpt_restore/build/gcpt.bin to 0x80000000 [src/monitor/image_loader.c:56,load_img] Loading image (bbl/bare metal app) from cmdline: /root/riscv-pk/build/bbl.bin

[src/monitor/image_loader.c:88,load_img] Fread from file because less than 512MB

[src/monitor/image_loader.c:120,load_img] Read 2629284 bytes from file /root/riscv-pk/build/bbl.bin to 0x800a0000 [src/device/io/port-io.c:15,add_pio_map] Add port-io map 'uartlite' at [0x00000000000003f8, 0x0000000000000404] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'uartlite' at [0x0000000040600000, 0x000000004060000c] [src/device/io/port-io.c:15,add_pio_map] Add port-io map 'rtc' at [0x0000000000000048, 0x000000000000004f] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'rtc' at [0x00000000a1000048, 0x00000000a100004f] [src/device/io/port-io.c:15,add_pio_map] Add port-io map 'screen' at [0x0000000000000100, 0x0000000000000107] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'screen' at [0x0000000040001000, 0x0000000040001007] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'vmem' at [0x0000000050000000, 0x00000000500752ff] [src/device/io/port-io.c:15,add_pio_map] Add port-io map 'keyboard' at [0x0000000000000060, 0x0000000000000063] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'keyboard' at [0x00000000a1000060, 0x00000000a1000063] [src/device/io/mmio.c:18,add_mmio_map] Add mmio map 'sdhci' at [0x0000000040002000, 0x000000004000207f] [src/device/sdcard.c:121,init_sdcard] Can not find sdcard image: [src/monitor/monitor.c:37,welcome] Debug: OFF [src/monitor/monitor.c:42,welcome] Build time: 17:31:30, Jun 24 2023 Welcome to riscv64-NEMU! For help, type "help" bbl loader freq-mhz = 500 CLINT: set frequency to 500 MHz [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 [ 0.000000] Linux version 4.18.0-14486-g655055af981b-dirty (root@localhost.localdomain) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04)) #11 Sun Jun 18 22:11:41 CST 2023 [ 0.000000] bootconsole [early0] enabled [ 0.000000] Initial ramdisk at: 0x(ptrval) (137216 bytes) [ 0.000000] Zone ranges: [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000080200000-0x0000000081ffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080200000-0x0000000081ffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x0000000081ffffff] [ 0.000000] Cannot allocate SWIOTLB buffer [ 0.000000] elf_hwcap is 0x112d [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 7575 [ 0.000000] Kernel command line: root=/dev/mmcblk0 rootfstype=ext4 ro rootwait earlycon [ 0.000000] Dentry cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] Inode-cache hash table entries: 2048 (order: 2, 16384 bytes) [ 0.000000] Sorting __ex_table... [ 0.000000] Memory: 29020K/30720K available (679K kernel code, 78K rwdata, 102K rodata, 196K init, 98K bss, 1700K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0 [ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns [ 0.000000] console [hvc0] enabled [ 0.000000] console [hvc0] enabled [ 0.000000] bootconsole [early0] disabled [ 0.000000] bootconsole [early0] disabled [ 0.000000] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=10000) [ 0.000000] pid_max: default: 4096 minimum: 301 [ 0.000000] Mount-cache hash table entries: 512 (order: 0, 4096 bytes) [ 0.000000] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes) [ 0.020000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.020000] futex hash table entries: 16 (order: -4, 384 bytes) [ 0.040000] clocksource: Switched to clocksource riscv_clocksource [ 0.040000] Unpacking initramfs... [ 0.140000] workingset: timestamp_bits=62 max_order=13 bucket_order=0 [ 0.170000] random: get_random_bytes called from 0xffffffff80034a30 with crng_init=0 [ 0.210000] Freeing unused kernel memory: 196K [ 0.210000] This architecture does not have kernel memory protection. [src/isa/riscv64/system/priv.c:168,disable_time_intr] Disabled machine time interruption

[src/profiling/profiling_control.c:19,reset_inst_counters] Start profiling, resetting inst count from 4148296 to 1, (n_remain_total will not be cleared)

Hello, ISC-V World! enter computing 49995000 -50002000 [src/cpu/cpu-exec.c:435,cpu_exec] nemu: HIT GOOD TRAP at pc = 0x000000000001015a [src/cpu/cpu-exec.c:439,cpu_exec] trap code:0 [src/cpu/cpu-exec.c:76,monitor_statistic] host time spent = 84255 us [src/cpu/cpu-exec.c:78,monitor_statistic] total guest instructions = 21878 [src/cpu/cpu-exec.c:79,monitor_statistic] simulation frequency = 259664 instr/s PPM correct: 0, PPM mispred: 0 MPKI: 0.000000 [src/profiling/betapoint_profiling.cpp:338,onExit] numLoad: 0, numStore: 0 [src/profiling/betapoint_profiling.cpp:339,onExit] Footprint: 0 cacheblocks, 0 KiB

[src/profiling/betapoint_profiling.cpp:213,dumpStride] Dump stride histogram [src/profiling/betapoint_profiling.cpp:251,dumpStride] global stride total: 0 [src/profiling/betapoint_profiling.cpp:251,dumpStride] global stride total: 0 [src/profiling/betapoint_profiling.cpp:288,dumpStride] local stride total: 0, local pc count: 0 [src/profiling/betapoint_profiling.cpp:288,dumpStride] local stride total: 0, local pc count: 0 [src/profiling/betapoint_profiling.cpp:351,dumpDistinctStrideInc] Dump new distinct strides : 0 [src/profiling/betapoint_profiling.cpp:363,dumpFootPrintInc] Dump footprint increments: 0 [src/profiling/betapoint_profiling.cpp:312,calcReuseMatrix] Dump Reuse matrix [src/profiling/betapoint_profiling.cpp:502,onExit] Dump critical path size: 0 [src/profiling/betapoint_profiling.cpp:509,onExit] Dump ppm miss count: 1 [src/utils/state.c:11,is_exit_status_bad] NEMU exit with good state: 2, halt ret: 0


2. 指令数量级应该会在数十万量级,但是我同样的参数,interval取比较小的时候,checkpointing依旧生成为空
3. 这里提供的这个bin,会一直hanging,请问这种情况如何正确完成profiling并结束程序?
shinezyy commented 1 year ago
  1. log看起来没有问题;
  2. 指令数量级应该会在数十万量级的时候,interval可能要设为数千。
  3. 一直hanging的程序可以通过命令行设置NEMU执行的最大指令数