Open DCliuzhe opened 1 month ago
I opened difftest and tested it, and found that GEM5 made an error when executing the vfredusum
instruction. GEM5 split it into two micro_ops for execution. The result was completely determined by the second micro_op, and the wrong result was obtained. I would like to ask why it was split into two micro_ops and the execution was wrong.
heap start = 8000a000
build/RISCV/cpu/base.cc:970: warn: Inst [sn:18228] pc: 0x800001b8, msg: [sn:18228 pc:0x800001b8] vfredusum_vs_micro v24, v25,vtmp0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.cc:972: warn: May be diff at v24
Ref value: ffffffffffffffff_ffffffff41000000
GEM5 value: 0000000000000000_0000000000000000
build/RISCV/cpu/base.hh:699: warn: In CPU0: NEMU PC: 0x800001b8, GEM5 PC: 0x800001b8, inst: vfredusum_vs_micro v24, v25,vtmp0
$0: 0x0000000000000000 ra: 0x0000000080000268 sp: 0x0000000080009fa0 gp: 0x0000000000000000
tp: 0x0000000000000000 t0: 0x0000000000000000 t1: 0x000000008000a000 t2: 0x0000000000000000
s0: 0x0000000000000000 s1: 0x0000000000000010 a0: 0x000000008000c000 a1: 0x0000000000000040
a2: 0x000000008000a000 a3: 0x000000008000b000 a4: 0x0000000000000000 a5: 0x0000000000000004
a6: 0x0000000000000000 a7: 0x0000000000000000 s2: 0x000000008000c000 s3: 0x0000000000000010
s4: 0x0000000000000010 s5: 0x0000000000000000 s6: 0x0000000000000000 s7: 0x0000000000000000
s8: 0x0000000000000000 s9: 0x0000000000000000 s10: 0x0000000000000000 s11: 0x0000000000000000
t3: 0x000000008000b000 t4: 0x0000000000000000 t5: 0x000000008000c040 t6: 0x0000000000000040
ft0: 0xffffffff00000000 ft1: 0xffffffff00000000 ft2: 0xffffffff00000000 ft3: 0xffffffff00000000
ft4: 0xffffffff00000000 ft5: 0xffffffff00000000 ft6: 0xffffffff00000000 ft7: 0xffffffff00000000
fs0: 0xffffffff00000000 fs1: 0xffffffff00000000 fa0: 0xffffffff00000000 fa1: 0xffffffff00000000
fa2: 0xffffffff00000000 fa3: 0xffffffff00000000 fa4: 0xffffffff00000000 fa5: 0xffffffff00000000
fa6: 0xffffffff00000000 fa7: 0xffffffff00000000 fs2: 0xffffffff00000000 fs3: 0xffffffff00000000
fs4: 0xffffffff00000000 fs5: 0xffffffff00000000 fs6: 0xffffffff00000000 fs7: 0xffffffff00000000
fs8: 0xffffffff00000000 fs9: 0xffffffff00000000 fs10: 0xffffffff00000000 fs11: 0xffffffff00000000
ft8: 0xffffffff00000000 ft9: 0xffffffff00000000 ft10: 0xffffffff00000000 ft11: 0xffffffff00000000
pc: 0x00000000800001bc mstatus: 0x8000000a00006600 mcause: 0x0000000000000000 mepc: 0x0000000000000000
sstatus: 0x8000000200006600 scause: 0x0000000000000000 sepc: 0x0000000000000000
satp: 0x0000000000000000
mip: 0x0000000000000000 mie: 0x0000000000000000 mscratch: 0x0000000000000000 sscratch: 0x0000000000000000
mideleg: 0x0000000000000000 medeleg: 0x0000000000000000
mtval: 0x0000000000000000 stval: 0x0000000000000000 mtvec: 0x0000000000000000 stvec: 0x0000000000000000
fcsr: 0x0000000000000000
privilege mode:3
pmp: 16 entries active, details:
0: cfg:0x00 addr:0x0000000000000000| 1: cfg:0x00 addr:0x0000000000000000
2: cfg:0x00 addr:0x0000000000000000| 3: cfg:0x00 addr:0x0000000000000000
4: cfg:0x00 addr:0x0000000000000000| 5: cfg:0x00 addr:0x0000000000000000
6: cfg:0x00 addr:0x0000000000000000| 7: cfg:0x00 addr:0x0000000000000000
8: cfg:0x00 addr:0x0000000000000000| 9: cfg:0x00 addr:0x0000000000000000
10: cfg:0x00 addr:0x0000000000000000|11: cfg:0x00 addr:0x0000000000000000
12: cfg:0x00 addr:0x0000000000000000|13: cfg:0x00 addr:0x0000000000000000
14: cfg:0x00 addr:0x0000000000000000|15: cfg:0x00 addr:0x0000000000000000
v0 : 0x0000000000000000_0000000000000000 v1 : 0x0000000000000000_0000000000000000
v2 : 0x0000000000000000_0000000000000000 v3 : 0x0000000000000000_0000000000000000
v4 : 0x0000000000000000_0000000000000000 v5 : 0x0000000000000000_0000000000000000
v6 : 0x0000000000000000_0000000000000000 v7 : 0x0000000000000000_0000000000000000
v8 : 0x0000000000000000_0000000000000000 v9 : 0x0000000000000000_0000000000000000
v10: 0x0000000000000000_0000000000000000 v11: 0x0000000000000000_0000000000000000
v12: 0x0000000000000000_0000000000000000 v13: 0x0000000000000000_0000000000000000
v14: 0x0000000000000000_0000000000000000 v15: 0x0000000000000000_0000000000000000
v16: 0x0000000000000000_0000000000000000 v17: 0x0000000000000000_0000000000000000
v18: 0x0000000000000000_0000000000000000 v19: 0x0000000000000000_0000000000000000
v20: 0x0000000000000000_0000000000000000 v21: 0x0000000000000000_0000000000000000
v22: 0x0000000000000000_0000000000000000 v23: 0x0000000000000000_0000000000000000
v24: 0xffffffffffffffff_ffffffff41000000 v25: 0x0000000000000000_0000000000000000
v26: 0x4000000040000000_4000000040000000 v27: 0x0000000000000000_0000000000000000
v28: 0x0000000000000000_0000000000000000 v29: 0x0000000000000000_0000000000000000
v30: 0x0000000000000000_0000000000000000 v31: 0x0000000000000000_0000000000000000
vtype: 0x00000000000000d0 vstart: 0x0000000000000000 vxsat: 0x0000000000000000
vxrm: 0x0000000000000000 vl: 0x0000000000000004 vcsr: 0x0000000000000000
build/RISCV/cpu/base.cc:1334: warn: gem5-rRegsDisplay :
$0 : 0 ra : 80000268 sp : 80009fa0 gp : 0
tp : 0 t0 : 0 t1 : 8000a000 t2 : 0
s0 : 0 s1 : 10 a0 : 8000c000 a1 : 40
a2 : 8000a000 a3 : 8000b000 a4 : 0 a5 : 4
a6 : 0 a7 : 0 s2 : 8000c000 s3 : 10
s4 : 10 s5 : 0 s6 : 0 s7 : 0
s8 : 0 s9 : 0 s10 : 0 s11 : 0
t3 : 8000b000 t4 : 0 t5 : 8000c040 t6 : 40
build/RISCV/cpu/base.cc:1347: warn: gem5-fRegsDisplay :
ft0 : ffffffff00000000 ft1 : ffffffff00000000 ft2 : ffffffff00000000 ft3 : ffffffff00000000
ft4 : ffffffff00000000 ft5 : ffffffff00000000 ft6 : ffffffff00000000 ft7 : ffffffff00000000
fs0 : ffffffff00000000 fs1 : ffffffff00000000 fa0 : ffffffff00000000 fa1 : ffffffff00000000
fa2 : ffffffff00000000 fa3 : ffffffff00000000 fa4 : ffffffff00000000 fa5 : ffffffff00000000
fa6 : ffffffff00000000 fa7 : ffffffff00000000 fs2 : ffffffff00000000 fs3 : ffffffff00000000
fs4 : ffffffff00000000 fs5 : ffffffff00000000 fs6 : ffffffff00000000 fs7 : ffffffff00000000
fs8 : ffffffff00000000 fs9 : ffffffff00000000 fs10 : ffffffff00000000 fs11 : ffffffff00000000
ft8 : ffffffff00000000 ft9 : ffffffff00000000 ft10 : ffffffff00000000 ft11 : ffffffff00000000
build/RISCV/cpu/base.cc:1398: warn: gem5-CsrDisplay :
pc : 800001b8 mstatus : a00000000 mcause : 0 mepc : 0
sstatus : 200000000 scause : 0 sepc : 0
satp : 0
mip : 0 mie : 0 mscratch: 0 sscratch: 0
mideleg : 0 medeleg : 0
mtval : 0 stval : 0 mtvec : 0 stvec : 0
privilege mode : 3
build/RISCV/cpu/base.cc:1423: warn: gem5-VectorDisplay :
v00 : 0000000000000000_0000000000000000 v01 : 0000000000000000_0000000000000000
v02 : 0000000000000000_0000000000000000 v03 : 0000000000000000_0000000000000000
v04 : 0000000000000000_0000000000000000 v05 : 0000000000000000_0000000000000000
v06 : 0000000000000000_0000000000000000 v07 : 0000000000000000_0000000000000000
v08 : 0000000000000000_0000000000000000 v09 : 0000000000000000_0000000000000000
v10 : 0000000000000000_0000000000000000 v11 : 0000000000000000_0000000000000000
v12 : 0000000000000000_0000000000000000 v13 : 0000000000000000_0000000000000000
v14 : 0000000000000000_0000000000000000 v15 : 0000000000000000_0000000000000000
v16 : 0000000000000000_0000000000000000 v17 : 0000000000000000_0000000000000000
v18 : 0000000000000000_0000000000000000 v19 : 0000000000000000_0000000000000000
v20 : 0000000000000000_0000000000000000 v21 : 0000000000000000_0000000000000000
v22 : 0000000000000000_0000000000000000 v23 : 0000000000000000_0000000000000000
v24 : 0000000000000000_0000000000000000 v25 : 0000000000000000_0000000000000000
v26 : 4000000040000000_4000000040000000 v27 : 0000000000000000_0000000000000000
v28 : 0000000000000000_0000000000000000 v29 : 0000000000000000_0000000000000000
v30 : 0000000000000000_0000000000000000 v31 : 0000000000000000_0000000000000000
vtype : d0 vstart : 0 vxsat : 0
vxrm : 0 vl : 4 vcsr : 0
build/RISCV/cpu/base.hh:702: warn: start dump last 20 committed msg
build/RISCV/cpu/base.hh:705: warn: V [sn:18198 pc:0x8000017c] c_li t4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18199 pc:0x8000017e] fsw fa3, 0(a0), paddr: 0x8000c000
build/RISCV/cpu/base.hh:705: warn: V [sn:18200 pc:0x80000182] bge zero, a1, 92
build/RISCV/cpu/base.hh:705: warn: V [sn:18212 pc:0x80000186] flw fa5, 0(a0), res: 0xffffffff00000000, paddr: 0x8000c000
build/RISCV/cpu/base.hh:705: warn: V [sn:18213 pc:0x8000018a] addiw a6, t4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18214 pc:0x8000018e] c_li a4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18215 pc:0x80000190] addw a2, a7, a4, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18216 pc:0x80000194] addw a3, a6, a4, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18217 pc:0x80000198] c_slli a2, 2, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18218 pc:0x8000019a] c_slli a3, 2, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18219 pc:0x8000019c] subw a5, a1, a4, res: 0x40
build/RISCV/cpu/base.hh:705: warn: V [sn:18220 pc:0x800001a0] c_add a2, t1, res: 0x8000a000
build/RISCV/cpu/base.hh:705: warn: V [sn:18221 pc:0x800001a2] c_add a3, t3, res: 0x8000b000
build/RISCV/cpu/base.hh:705: warn: V [sn:18222 pc:0x800001a4] vsetvli a5, a5, e32, m1, ta, ma, res: 0x4
build/RISCV/cpu/base.hh:705: warn: V [sn:18223 pc:0x800001a8] vle32_v_micro v24, 0(a2), zero, res: 3f8000003f800000_3f8000003f800000, paddr: 0x8000a000
build/RISCV/cpu/base.hh:705: warn: V [sn:18224 pc:0x800001ac] vle32_v_micro v26, 0(a3), zero, res: 4000000040000000_4000000040000000, paddr: 0x8000b000
build/RISCV/cpu/base.hh:705: warn: V [sn:18225 pc:0x800001b0] vmv_v_i_micro v25, v0, 0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.hh:705: warn: V [sn:18226 pc:0x800001b4] vfmul_vv_micro v24, v24, v26, res: 4000000040000000_4000000040000000
build/RISCV/cpu/base.hh:705: warn: V [sn:18227 pc:0x800001b8] vfredusum_vs_micro v24, v24, res: 0000000000000000_0000000041200000
build/RISCV/cpu/base.hh:705: warn: V [sn:18228 pc:0x800001b8] vfredusum_vs_micro v24, v25,vtmp0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.cc:1304: panic: Difftest failed!
It seem like vectorReduceFloatFormat has bug, because of add vectorOldVDElim May you avoid using vector reduce inst type?
Thanks for your reply, I will try not to use vectorReduce next time. Please mention it in your commit logs after you fix the bug.
I have implemented a GEMM kerenl using RVV and complie it into a bare metal using AM. Before simulation, I deleted the function calls that were not aligned with the RTL and depended on the vector destination register fake data in the issue_queue.cc file. However, the output matrix elements in GEM5 simulation results are all 0, which are expected to 128. I also simulated it on the original GEM5, and the result was correct. My source code are as followed:
Through staged debugging, I determined that the problem occurred in the step of vector reduction. How can I fix it?