cornell-brg / pymtl

Python-based hardware modeling framework
BSD 3-Clause "New" or "Revised" License
234 stars 83 forks source link

Add AST caching to avoid redundant file scan #164

Open jsn1993 opened 7 years ago

jsn1993 commented 7 years ago

This PR is to reduce the redundant file inspection to parse the AST when we want to register the combinational/tick/posedge_clk blocks to the simulation. I need to fix the affected tests but just want to know people's thoughts.

As I observed from vmprof, for a 4-core, 8-cache, 3-network composition written in pure pymtl, there are 600+ instances of different models(including 10s of regs, 10s of mux, and such) and 900+ blocks, and the simulation takes only 100s of cycles. Basically, 50% percent of the time is spent on scanning the file to parse the AST for the 900+ models. What makes things worse is that we cannot reuse them across different test cases.

What I did is a "meta AST cache" which assumes the model won't change in a single py.test execution so that we could cache the parsed AST to a static array which belongs to type(model), to prevent duplicated file scan.

Having said this, this won't give us any benefit for simulation alone. The benefit makes more sense when we have hundreds of test cases to run on a fairly big module and each of them only runs for 10s or 100s cycles.

I did a "py.test lab2_proc/" for ece 4750 on my computer (file is read from PCIe SSD!) and found that with the meta cache the 282 test cases takes 135s, but without meta cache it's 205s, which is pretty good.

This approach is superior to py.test fixture because the ast meta cache is stronger i.e. it captures reuse within a single instantiation. Suppose we want to write a tile64, meta cache don't even need to parse the processor for 64 times. Py.test fixture will parse 64 times at the first time but not for the second test case.

jsn1993 commented 7 years ago

The test cases fail because the number of blocks may vary based on the parameter ...

See the following hacky test case ...

for i in range( s.nstages - 1 ):
  @s.tick_rtl
  def func( i = i ):  # Need to capture i for this to work
    s.wire[ i + 1 ].n = s.wire[ i ]
jsn1993 commented 7 years ago

OK, I did try to use the hash of the class. It turns out that models such as Mux, RegEnRst are often reconstructed by different bitwidth, so the hash wiggles among several values and the performance is affected a little bit. But really the fact is that these instances with different hash values have the same combinational blocks, unlike the weird test case.

I'm wondering if we should just forbid this kind of loop, as the "need func(i=i) for this to work" is already very hacky!

for i in range( s.nstages - 1 ):
  @s.tick_rtl
  def func( i = i ):  # Need to capture i for this to work
    s.wire[ i + 1 ].n = s.wire[ i ]

What do people think?

jck commented 7 years ago

I think it makes more sense to make the AST available as a cached property. The decorator can do this.