firesim / firesim

FireSim: Fast and Effortless FPGA-accelerated Hardware Simulation with On-Prem and Cloud Flexibility
https://fires.im
Other
890 stars 232 forks source link

WithSynthAsserts identity incorrect (at least in RTL simulation) #581

Closed timsnyder closed 4 years ago

timsnyder commented 4 years ago

I was curious whether WithSynthAsserts would work when running MIDAS-level RTL simulation in Verilator

To try it, I added an assertion that I knew would fire, built my verilator and ran make run-asm-tests, an assertion fired but the printed message did not correspond to the assertion that I added.

To reproduce this with FireSim 1.10:

When I ran, I got a message about an assertion in pbus.coupler_to_blkdevctrl.fragmenter:

-bash-4.2$ cat output/f1/FireSim-FireSimRocketConfig-WithSynthAsserts_BaseF1Config/rv64si-p-csr.out 
random min: 0x0, random max: 0xffffffffffffffff
Commencing simulation.
C0:         19 [1] pc=[0000000000010040] W[r10=0000000000010040][1] R[r 0=0000000000000000] R[r 0=0000000000000000] inst=[00000517] auipc   a0, 0x0
C0:         20 [1] pc=[0000000000010044] W[r10=0000000000010000][1] R[r10=0000000000010040] R[r 0=0000000000000000] inst=[fc050513] addi    a0, a0, -64
C0:         21 [1] pc=[0000000000010048] W[r 0=0000000000000000][1] R[r10=0000000000010000] R[r 0=0000000000000000] inst=[30551073] csrw    mtvec, a0
module: TLFragmenter, path: FireSim.TestHarness.subsystem_pbus.coupler_to_slave_named_blkdevcontroller.fragmenter]
Assertion failed
    at Fragmenter.scala:300 assert (!repeater.io.full || !aHasData)
 at cycle: 71
davidbiancolin commented 4 years ago

Thanks Tim for the bug report. I'll have a look at this later today.

davidbiancolin commented 4 years ago

Hey Tim you can try the fix in #582.

To clarify, a simulation run under ML simulation should exhibit precisely the same target behavior (assuming the same set of runtime arguments) as it would on an FPGA. Printfs, assertions, tracerV etc should all produce exactly the same target output but at different host times. One example: the total number of target cycles should be the same for the same workload but the number of host-cycles it takes to execute them (and thus FMR) will differ.

timsnyder commented 4 years ago

I'm rebasing my work forward to 1.10 and I hope to try your fix later today. Thanks.

timsnyder commented 4 years ago

Hey David, I still don't quite have my design up to 1.10 so I decided to just checkout your branch and rerun the reproduction that I posted above using RocketChip. I still see the same output that says the assertion is coming from FireSim.TestHarness.subsystem_pbus.coupler_to_slave_named_blkdevcontroller.fragmenter

To make sure that I'm not completely losing my marbles or more accurately, causing another assertion in fragmenter to fire by patching Rocket the way I did above, I dumped waves and I'm digging into them. I'll post a screenshot when I get something interesting.

timsnyder commented 4 years ago

Hi David,

Here's what my waveforms look like with your changes in #582: image

You can see that the midasAsserts in Rocket's frontend fire in target cycle 21 (io_time is the target cycle count, right?). The pbus.coupler_to_slave_named_blkdevcontroller.fragmenter asserts don't seem to be firing.

timsnyder commented 4 years ago

Looking at the TestHarness_midasAsserts[602:0], it looks like bit 589 is firing: image

Oh, but looking at FireSim-const.h, I seee that ASSERTBRIDGEMODULE_0_assert_count is 598, so I guess TestHarness_midasAsserts[602:0] packs in a few more signals. FireSim.midasAsserts_clockBridge_clocks_0_source_bits[597:0] looks like what I want and bit 13 is firing there (reversed from the TestHarness_midasAsserts[602:0]) image

Looking at ASSERTBRIDGEMODULE_0_assert_messages, I see that the assertion which is firing is near the end. Indexing the array backward from the end looks like this:

                                      R"ESC(module: Frontend, path: FireSim.TestHarness.tile.frontend]
13                                  Assertion failed
                                        at Frontend.scala:178 assert(!(s2_speculative && io.ptw.customCSRs.asInstanceOf[RocketCustomCSRs].disableSpeculativeICacheRefill && !icache.io.s2_kill))
---------------------------------   )ESC",
                                      R"ESC(module: Frontend, path: FireSim.TestHarness.tile.frontend]
12                                  Assertion failed
 *I added this one*                                       at Frontend.scala:91 assert(!io.cpu.req.valid)
---------------------------------   )ESC",
                                      R"ESC(module: Frontend, path: FireSim.TestHarness.tile.frontend]
11                                  Assertion failed
                                        at Frontend.scala:90 assert(!(io.cpu.req.valid || io.cpu.sfence.valid || io.cpu.flush_icache || io.cpu.bht_update.valid || io.cpu.btb_update.valid) || io.cpu.might_request)
---------------------------------   )ESC",
                                      R"ESC(module: ICache, path: FireSim.TestHarness.tile.frontend.icache]
10                                   Assertion failed
                                        at ICache.scala:260 assert(!(s1_valid || s1_slaveValid) || PopCount(s1_tag_hit zip s1_tag_disparity map { case (h, d) => h && !d }) <= 1)
---------------------------------   )ESC",
                                      R"ESC(module: TLB_1, path: FireSim.TestHarness.tile.frontend.tlb]
9                                   Assertion failed
                                        at TLB.scala:368 assert(!io.sfence.bits.rs1 || (io.sfence.bits.addr >> pgIdxBits) === vpn)
---------------------------------   )ESC",
                                      R"ESC(module: FPU, path: FireSim.TestHarness.tile.fpuOpt]
8                                   Assertion failed
                                        at FPU.scala:864 assert(consistent(wdata))
---------------------------------   )ESC",
                                      R"ESC(module: FPU, path: FireSim.TestHarness.tile.fpuOpt]
7                                   Assertion failed
                                        at FPU.scala:736 assert(consistent(wdata))
---------------------------------   )ESC",
                                      R"ESC(module: PTW, path: FireSim.TestHarness.tile.ptw]
6                                   Assertion failed
                                        at PTW.scala:363 assert(state === s_wait2)
---------------------------------   )ESC",
                                      R"ESC(module: PTW, path: FireSim.TestHarness.tile.ptw]
5                                   Assertion failed
                                        at PTW.scala:346 assert(state === s_wait3)
---------------------------------   )ESC",
                                      R"ESC(module: PTW, path: FireSim.TestHarness.tile.ptw]
4                                   Assertion failed
                                        at PTW.scala:339 assert(state === s_req || state === s_wait1)
---------------------------------   )ESC",
                                      R"ESC(module: IBuf, path: FireSim.TestHarness.tile.core.ibuf]
3                                   Assertion failed
                                        at IBuf.scala:79 assert(!io.imem.valid || !io.imem.bits.btb.taken || io.imem.bits.btb.bridx >= pcWordBits)
---------------------------------   )ESC",
                                      R"ESC(module: CSRFile, path: FireSim.TestHarness.tile.core.csr]
2                                   Assertion failed
                                        at CSR.scala:670 assert(!reg_singleStepped || io.retire === UInt(0))
---------------------------------   )ESC",
                                      R"ESC(module: CSRFile, path: FireSim.TestHarness.tile.core.csr]
1                                   Assertion failed: these conditions must be mutually exclusive
                                        at CSR.scala:662 assert(PopCount(insn_ret :: insn_call :: insn_break :: io.exception :: Nil) <= 1, \"these conditions must be mutually exclusive\")
---------------------------------   )ESC",
                                      R"ESC(module: SerialAdapter, path: FireSim.TestHarness.m]
0                                   Assertion failed: Bad TSI command
                                        at SerialAdapter.scala:144 assert(false.B, \"Bad TSI command\")
---------------------------------   )ESC"
                                    };

So, it seems that the assertion that I added is:

  1. at the high end of the assertions (maybe there's a reverse in the path somewhere you don't need it?)
  2. off by 1 somehow because it should be at index 12 from the end counting from 0
davidbiancolin commented 4 years ago

Hey Tim, thanks for the waveforms. Sorry this continues to be broken for you. :(

I spent today working on improving the tests for assertion synthesis. My initial "fix" fixed one real bug but unmasked a second -- so it happened to work for me only by coincidence. I'll push the most recent changes i've made that should fix the regressions for single-clock designs -- i'm still debugging some odd behavior in multiclock.

There are going to be a few extra signals in TestHarness_midasAsserts because it includes a few assertions in undriven clock domains. AssertionSynthesis can't find a source clock for these assertions and so doesn't bind them to a bridge (FireSim.midasAsserts_clockBridge_clocks_0_source_bits[597:0]) but it still plumbs them out to the top.

davidbiancolin commented 4 years ago

You can see that the midasAsserts in Rocket's frontend fire in target cycle 21 (io_time is the target cycle count, right?).

Yes but io_time wouldn't count cycles under reset i think while the assertion bridge will.

davidbiancolin commented 4 years ago

Alright pushed.

davidbiancolin commented 4 years ago

I added some extra reverses / changed the sort order to make things appear in declaration order.

timsnyder commented 4 years ago

My reproduction test looks good to me now!

module: Frontend, path: FireSim.TestHarness.tile.frontend]
Assertion failed
    at Frontend.scala:91 assert(!io.cpu.req.valid)
 at cycle: 71
timsnyder commented 4 years ago

I'm going to be adding multiclock to my design very soon and I'll try this out again with that when I can.

davidbiancolin commented 4 years ago

K, i'm content that the proposed fix works for multiclock too. I'll try to get it reviewed and merged into master and dev early this week.

timsnyder commented 4 years ago

@davidbiancolin , I cherry-picked the first two commits you made to fix this:

commit  sha_from_private_repo
Author: David Biancolin <david.biancolin@gmail.com>
Date:   Sat Jun 20 02:00:17 2020 -0700

    [assert] Fix submodule ordering in concatenations

    (cherry picked from commit 40b900c5ca895fbe5355486208c8c1ae605acef5)

commit sha_from_private_repo
Author: David Biancolin <david.biancolin@gmail.com>
Date:   Wed Jun 17 02:46:12 2020 -0700

    Fix indexing bug in assertion synth introduced by multiclock

    (cherry picked from commit b816c3a24c46f03c8b50dc939218402e4c0dfd6f)

And also turned on printfSynthesis for our equivalent of "commit log" that RocketChip prints to simulation and the printfSynthesis doc mentions.

With those two changes, my design no-longer gets through routing in Vivado. I'm surprised.

I think this may be unreleated to your change and is hopefully that the printfSynthesis just added a lot more stuff than I thought it would. I kicked off another build this morning that only grabs the two commits I mentioned above.

I wanted to mention this just in case there's a chance that your assert fixes are adding a few thousand more wires that need to be routed each time you add a 'reverse'. I don't think that they are but I also haven't looked in detail at the changes to the Verilog.

I'm also mentioning this a little earlier than I normally would. I'm getting ready to drive home to Texas and didn't want to forget to loop you in.

timsnyder commented 4 years ago

Yeah, the routing problem seems to either be caused by printfSynthesis or the combination of assert and printfs. When I removed the printfs, it's gotten through routing csuccessfully.

davidbiancolin commented 4 years ago

If reversing the order of the concatenation causes extra nets to be generated i'd be deeply disturbed. :(

Printf synthesis is not exactly cheap depending on just how much you're trying to capture. I'm assuming your resource utilization is still OK -- can you just bump the frequency down?

It occured to me that maybe we could just remove the whole priority encoder and replace it with a 32bit-wide scan chain. Then software could just iteratively look for fired assertions without all of the extra LUTs to implement the encoding.

timsnyder commented 4 years ago

can you just bump the frequency down?

This build is without the decoupling queue on the input of the priority encoder and I was already only shooting for 45MHz.

Since this issue was originally about the SynthAsserts lookup being incorrect and my routing issue doesn't seem to be related to that, maybe we should take the dialog about timing and congestion improvements back into the email thread we had about adding the queue?

davidbiancolin commented 4 years ago

hmm, yeah lets talk about this there