Open xieby1 opened 1 week ago
Recently I encountered a serious problem with NEMU. Running workloads twice on the same NEMU results in different instruction counts, which causes some workloads finish before the last checkpoint start.
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384600000000
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384660000000
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384680000000
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384700000000
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384720000000
[src/checkpoint/serializer.cpp:328,instrsCouldTakeCpt] First cpt @ 2427740000000, now: 2384740000000
[/build/source/src/isa/riscv64/include/../instr/special.h:38,execute] nemu_trap case 0
[src/cpu/cpu-exec.c:734,cpu_exec] nemu: HIT GOOD TRAP at pc = 0x0000000000010522
[src/cpu/cpu-exec.c:740,cpu_exec] trap code:0
[src/cpu/cpu-exec.c:94,monitor_statistic] host time spent = 25768154605 us
[src/cpu/cpu-exec.c:96,monitor_statistic] total guest instructions = 2384825716646
[src/cpu/cpu-exec.c:97,monitor_statistic] vst count = 407850, vst unit count = 407850, vst unit optimized count = 0
[src/cpu/cpu-exec.c:100,monitor_statistic] simulation frequency = 92549340 instr/s
[src/utils/state.c:30,is_exit_status_bad] NEMU exit with good state: 2, halt ret: 0
This is VERY SEVERE. If the instruction count is not deterministic, the interval provided by SimPoint is INVALID.
The generated checkpoints using nemu are.not.deterministic. In other words, if generate checkpoints twice, the md5sum of first and second generated checkpoints are different.