circt / arc-tests

A collection of tests and benchmarks for the Arc simulation backend of CIRCT
25 stars 2 forks source link

Running diffvcd on Trace Files: No Common Signals #7

Open nn020701 opened 1 month ago

nn020701 commented 1 month ago

I am using the conda environment mentioned in the Chipyard documentation. When I run make -C rocket run, it finishes normally.

build/small-v1.6/rocket-main ../benchmarks/dhrystone.riscv 
loading segment at 60000000 (virtual address 60000000)
loading segment at 80000000 (virtual address 80000000)
entry 80000000
loaded 20888 program bytes
Microseconds for one run through Dhrystone: 799
Dhrystones per Second:                      1250
mcycle = 399986
minstret = 192528
Benchmark run successful!
----------------------------------------
412458 cycles total
vtor: 25664.4 Hz
arcs: 260008 Hz

However, when I run make -C rocket run-trace

build/small-v1.6/rocket-main ../benchmarks/dhrystone.riscv --trace build/small-v1.6/rocket.vcd
loading segment at 60000000 (virtual address 60000000)
loading segment at 80000000 (virtual address 80000000)
entry 80000000
loaded 20888 program bytes
Microseconds for one run through Dhrystone: 799
Dhrystones per Second:                      1250
mcycle = 399986
minstret = 192528
Benchmark run successful!
----------------------------------------
412458 cycles total
vtor: 25786.2 Hz
arcs: 273288 Hz

and use diffvcd with the command ./diffvcd.py rocket-{vtor,arcs}.vcd --top1 TOP.RocketSystem. --top2 RocketSystem.internal. -i icache.readEnable -i icache.writeEnable, it returns "no common signals between input files." Is this expected behavior, or did something go wrong during my execution?

OS: Ubuntu22.04

maerhart commented 1 month ago

Hi @nn020701, thanks for reporting this issue!

To trace all the internal wires, ports, memories, registers, etc. you need to additionally set the TRACE make variable like this: make -C rocket run-trace TRACE=1 CONFIG=small-master. Also, I'd recommend to delete the build directory before running this command to make sure the necessary parts are re-compiled (we should change the Makefile such that this happens automatically in the future).

Please let me know if this still doesn't lead to the desired result.

nn020701 commented 1 month ago

Thank you for your guidance! I can now compare the VCD files. However, when using ./diffvcd.py rocket-{vtor,arcs}.vcd --top1 TOP.RocketSystem. --top2 RocketSystem.internal. -i icache.readEnable -i icache.writeEnable -a 1, I found that many signals are different.

1  0  1f  subsystem_cbus.out_xbar.readys_mask[4:0]
1  0  1  subsystem_l2_wrapper.auto_coherent_jbar_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.got_e
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.io_idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.io_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.io_in_a_ready_0
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker.sent_d
1  0  40  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.got_e
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.io_idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.io_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.io_in_a_ready_0
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.io_line[25:0]
1  0  40  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.io_out_a_bits_address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_1.sent_d
1  0  80  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.got_e
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.io_idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.io_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.io_in_a_ready_0
1  0  2  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.io_line[25:0]
1  0  80  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.io_out_a_bits_address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_2.sent_d
1  0  c0  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.got_e
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.io_idle
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.io_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.io_in_a_ready_0
1  0  3  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.io_line[25:0]
1  0  c0  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.io_out_a_bits_address[31:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.TLBroadcastTracker_3.sent_d
1  f  0  subsystem_l2_wrapper.broadcast_1._d_trackerOH_T_8[3:0]
1  1  0  subsystem_l2_wrapper.broadcast_1._matchTrackers_T_3
1  1  0  subsystem_l2_wrapper.broadcast_1._matchTrackers_T_5
1  1  0  subsystem_l2_wrapper.broadcast_1._matchTrackers_T_7
1  0  1  subsystem_l2_wrapper.broadcast_1.auto_in_a_ready
1  1  0  subsystem_l2_wrapper.broadcast_1.c_trackerOH_1
1  1  0  subsystem_l2_wrapper.broadcast_1.c_trackerOH_2
1  1  0  subsystem_l2_wrapper.broadcast_1.c_trackerOH_3
1  f  0  subsystem_l2_wrapper.broadcast_1.d_trackerOH[3:0]
1  0  f  subsystem_l2_wrapper.broadcast_1.d_trackerOH_r[3:0]
1  3  0  subsystem_l2_wrapper.broadcast_1.filter.io_request_bits_mshr[1:0]
1  3  0  subsystem_l2_wrapper.broadcast_1.filter.io_response_bits_mshr[1:0]
1  f  1  subsystem_l2_wrapper.broadcast_1.filter_io_request_bits_mshr_lo[3:0]
1  0  1  subsystem_l2_wrapper.broadcast_1.monitor.io_in_a_ready
1  0  1  subsystem_l2_wrapper.broadcast_1.nodeIn_a_ready
1  0  1  subsystem_l2_wrapper.coherent_jbar.auto_in_a_ready
1  0  1  subsystem_l2_wrapper.coherent_jbar.auto_out_a_ready
1  0  1  subsystem_mbus.coupler_to_memory_controller_port_named_axi4.tl2axi4.r_first
1  0  1  subsystem_sbus.auto_coupler_to_bus_named_subsystem_l2_widget_out_a_ready
1  0  1  subsystem_sbus.coupler_to_bus_named_subsystem_l2.auto_widget_in_a_ready
1  0  1  subsystem_sbus.coupler_to_bus_named_subsystem_l2.auto_widget_out_a_ready
1  0  1  subsystem_sbus.coupler_to_bus_named_subsystem_l2.widget.auto_in_a_ready
1  0  1  subsystem_sbus.coupler_to_bus_named_subsystem_l2.widget.auto_out_a_ready
1  0  1  subsystem_sbus.coupler_to_port_named_mmio_port_axi4.tl2axi4.r_first
1  0  1  subsystem_sbus.system_bus_xbar.auto_out_1_a_ready
1  0  7  subsystem_sbus.system_bus_xbar.readys_mask[2:0]
1  1ffffffff  1fffeffff  tile_prci_domain.tile_reset_domain.tile.core.bpu._x_T_5[32:0]
1  0  10000  tile_prci_domain.tile_reset_domain.tile.core.bpu.io_pc[32:0]
1  0  208  tile_prci_domain.tile_reset_domain.tile.core.csr.io_customCSRs_0_value[63:0]
1  0  801105  tile_prci_domain.tile_reset_domain.tile.core.csr.io_status_isa[31:0]
1  0  208  tile_prci_domain.tile_reset_domain.tile.core.csr.reg_custom_0[63:0]
1  0  8000000000801105  tile_prci_domain.tile_reset_domain.tile.core.csr.reg_misa[63:0]
1  0  3  tile_prci_domain.tile_reset_domain.tile.core.csr.reg_mstatus_mpp[1:0]
1  0  10000  tile_prci_domain.tile_reset_domain.tile.core.ibuf.io_imem_bits_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.core.ibuf.io_imem_bits_xcpt_ae_inst
1  0  1  tile_prci_domain.tile_reset_domain.tile.core.ibuf.io_inst_0_bits_xcpt0_ae_inst
1  0  10000  tile_prci_domain.tile_reset_domain.tile.core.ibuf.io_pc[33:0]
1  0  10000  tile_prci_domain.tile_reset_domain.tile.core.io_imem_resp_bits_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.core.io_imem_resp_bits_xcpt_ae_inst
1  0  208  tile_prci_domain.tile_reset_domain.tile.core.io_ptw_customCSRs_csrs_0_value[63:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.dcache.REG
1  0  1  tile_prci_domain.tile_reset_domain.tile.dcache.s2_not_nacked_in_s1
1  2  8000  tile_prci_domain.tile_reset_domain.tile.frontend._io_cpu_npc_T[32:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend._s2_xcpt_T
1  0  10000  tile_prci_domain.tile_reset_domain.tile.frontend.fq.io_deq_bits_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.fq.io_deq_bits_xcpt_ae_inst
1  0  10000  tile_prci_domain.tile_reset_domain.tile.frontend.fq.io_enq_bits_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.fq.io_enq_bits_xcpt_ae_inst
1  4  10000  tile_prci_domain.tile_reset_domain.tile.frontend.icache.io_req_bits_addr[32:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.icache.io_s1_kill
1  0  4  tile_prci_domain.tile_reset_domain.tile.frontend.icache.io_s1_paddr[31:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.icache.io_s2_kill
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.icache.s2_request_refill_REG
1  0  10000  tile_prci_domain.tile_reset_domain.tile.frontend.io_cpu_resp_bits_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.io_cpu_resp_bits_xcpt_ae_inst
1  0  208  tile_prci_domain.tile_reset_domain.tile.frontend.io_ptw_customCSRs_csrs_0_value[63:0]
1  4  8  tile_prci_domain.tile_reset_domain.tile.frontend.predicted_npc[33:0]
1  0  4  tile_prci_domain.tile_reset_domain.tile.frontend.s1_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.s1_valid
1  0  10000  tile_prci_domain.tile_reset_domain.tile.frontend.s2_pc[33:0]
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.s2_replay
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.s2_replay_REG
1  0  1  tile_prci_domain.tile_reset_domain.tile.frontend.s2_tlb_resp_ae_inst
1  0  4  tile_prci_domain.tile_reset_domain.tile.frontend.tlb.io_req_bits_vaddr[33:0]
1  0  4  tile_prci_domain.tile_reset_domain.tile.frontend.tlb.io_resp_paddr[31:0]
1  0  4  tile_prci_domain.tile_reset_domain.tile.frontend.tlb.pmp.io_addr[31:0]
1  0  208  tile_prci_domain.tile_reset_domain.tile.ptw.io_dpath_customCSRs_csrs_0_value[63:0]
1  0  208  tile_prci_domain.tile_reset_domain.tile.ptw.io_requestor_1_customCSRs_csrs_0_value[63:0]
1  0  3  tile_prci_domain.tile_reset_domain.tile.tlMasterXbar.readys_mask[1:0]

According to the documentation, when the program ends normally, the simulations for Vtor and Arcs should not diverge. So how should I use diffvcd to determine that there are no divergences in the simulations solely based on the VCD files?

maerhart commented 1 month ago

The rocket testbench of arcilator currently just zero-initializes all registers and memories and doesn't consider power-on values. This can lead to mismatches during the early reset cycles. In rocket-main.cpp you can see that the first 100 cycles are reset and the testbench doesn't do anything meaningful in the first 1000 cycles (the root inputs are not set to anything specific). Only then the dhrystone benchmark is started. However, we already dump VCD for those 1000 cycles.

Setting -a 1000 doesn't lead to any differences being outputted by the script for me. I'm not sure at which cycle exactly the mismatches stop, though.

Also, the run is reported successful if the dhrystone program terminates normally and the output port values of the top module match between the arcilator and verilator runs for all cycles after the first 1000 cycles. If success for you means that all internal signals, registers, etc. also match, using this script is currently the right way to go.

Support for power-on values and more precise initialization at cycle 0 is currently under development, so this might change soon.

nn020701 commented 1 month ago

Thank you for your reminder! After using -a 1000, there are indeed no discrepancies now. My problem is perfectly resolved. Thanks again!