B-Lang-org / bsc

Bluespec Compiler (BSC)
Other
952 stars 146 forks source link

Clock and reset handling should be as hierarchical as possible #237

Open bpfoley opened 4 years ago

bpfoley commented 4 years ago

As an example, compile the attached test.

It consists of a top level mkTest module, which instantiates 4 mkB modules, each of which in turn instantiates 2 mkA modules.

In the generated C++, the code to wire up the clocks and trigger the reset for every single module is put in the top level, giving us 4(2+1) clock wiring statements, and 42 reset clock ticking statements.

This is problematic with compilers like clang which compile slowly when there are large numbers of live values in a single block.

bsc -sim Quadratic.bs
bsc -sim -o mkTest.exe -e mkTest mkTest.ba
awk '/bk_set_clock/,/}/' model_mkTest.cxx

  bk_set_clock_event_fn(sim_hdl,
            bk_get_clock_by_name(sim_hdl, "CLK"),
            schedule_posedge_CLK,
            NULL,
            (tEdgeDirection)(POSEDGE));
  (mkTest_instance->INST_theBs_0.INST_a1.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_0.INST_a2.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_0.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_1.INST_a1.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_1.INST_a2.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_1.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_2.INST_a1.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_2.INST_a2.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_2.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_3.INST_a1.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_3.INST_a2.set_clk_0)("CLK");
  (mkTest_instance->INST_theBs_3.set_clk_0)("CLK");
  (mkTest_instance->set_clk_0)("CLK");
}

$ awk '/reset_ticks/,/}/' model_mkTest.cxx
     if (do_reset_ticks(simHdl))
     {
       INST_top.INST_theBs_0.INST_a1.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_0.INST_a2.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_1.INST_a1.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_1.INST_a2.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_2.INST_a1.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_2.INST_a2.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_3.INST_a1.INST_addr.rst_tick__clk__1((tUInt8)1u);
       INST_top.INST_theBs_3.INST_a2.INST_addr.rst_tick__clk__1((tUInt8)1u);
     }

When the system consists of a large number of modules with nested layers of submodules, this creates a list of statements in the top level that's the product each of the layers of instantiation.

If instead each module is responsible for its own schedule_posedge_CLK and create_model, then each module would only have as many statements as modules that it directly instantiates, and the total number of statements would be reduced to the sum of each of the layers of instantiation. This would both compile faster and produce smaller binaries. Quadratic.bs.txt

nanavati commented 3 years ago

I think there were a few other example of unnecessarily repetitive code (as opposed to the necessarily centralized schedule), beyond the clock and reset handling, but I can't see them when generating C++ code from this example. I don't know if that means something was fixed or if there's another wrinkle that comes up in more complex designs so I'd suggest keeping a eye out for things like that.