Closed paili0628 closed 1 year ago
Don't yet have time to review but I recommend:
infer-static-timing-new
. Maybe group-static-promotion
or something-p static
@paili0628 let me know when this is ready for re-review
@rachitnigam @calebmkim This pr is now ready for re-review. I've now found out that the original infer-static-timing
pass does not correctly infer the static timing for components. In a newly added test case, a component that is inferred to run for 4 cycles in fact runs for 8 cycles. I think we should combine the part of control-promotion
(turn a seq
into a static seq
if all its stmts are static) with the group-static-promotion
pass and only infer the latency of a component when its control
is Control::Static(_)
.
Alternatively, why don't we just wait until the control-promotion
pass runs to infer the latency of components?
Currently, infer-static-timing
does both group promotion and control promotion, but I think what we should separate those passes out.
I agree with @calebmkim: the two should be different passes unless @paili0628 you have a different argument to combine the two. If not, can we make it so that this pass only handles transforming groups into static groups and replacing their enables with static enables?
Ok, so after digging around a bit, there a few reasons for the cycle counts increase. I've tried to fix some of them (and some tests may fail bc the cycle count is now lower; can fix this later). For the non-Dahlia test cases, I think these changes are enough to get better cycle counts than before.
Here are the reasons I've "fixed":
simplify-with-control
was generating groups with a static annotation of 1, but the guards for the done signal were in the form group[done] = reg.done ? 1'd1
, so static-promotion
wasn't able to infer it. Obvious solution is just to generate static groups instead (since we know latency will be 1 anyways). static-promotion
pass isn't detecting invoke std_reg()()
(which comes from the group2invoke
pass), since even though we know a std_reg
static latency, it isn't considered a "static" primitive yet and therefore isn't statically invoked. The solution for now is just moving compile-invoke
before static-promotion
, since static-promotion
can infer the latency this way. This is probably not the most ideal solution but it works to significantly decrease cycle counts compared to before. Ones that we still need to "fix"
static repeat
s in the Dahlia->Calyx backend, that will probably help us a lot. calyx-py
should use static groups instead of static annotations. Thanks for digging into it! I think that for this PR, we should at least ensure that the cycles counts are no worse than what they were before. Would it make sense to have @static
annotation promotion pass that helps us wrap this up for now and eventually gets removed with https://github.com/cucapra/calyx/issues/1429?
Yeah. I think for the time being, especially having a pass that converts @bound
annotations into static repeat
will help the cycle counts.
ok, so there's one remaining test that is worse for cycle count: tests/correctness/sync/sync-dot-product-opt.futil
.
Weirdly, what's making it worse is moving compile-invoke
to run before static-promotion
(which was the move that was helping us for other test cases). I've tried to look at the -p pre-opt
futil files using the two different pass orders, and I can't figure out why moving compile-invoke
is hurting the cycle count. Maybe @paili0628 would know what's going on?
Nice digging! It would be cool to understand more, but a @sync
-related test case can perhaps be forgiven for behaving a little oddly…
wait I just realized the error is actually on the output value of the sync
test, not the cycle count.... which is not good. I can try to look into it as well as @paili0628 if you could as well, since you wrote the sync pass. But either way, we obv need to fix this before we merge it. I'll try to work on getting a minimal failing case example.
Ok, so after investigating the @sync
failing test stuff, I've found that manually changing the .futil file's static timing so that there is a one cycle difference makes it correct. E.g.:
static<2> group my_group {
// contents of group
}
is wrong,
static<3> group my_group {
// exact same contents of group
}
is right.
Here are the files: zgood.futil.txt zbad.futil.txt z.data.txt
running:
fud e zgood.futil -s verilog.data z.data --to dat -s futil.flags "-d static-promotion -d cell-share " --through icarus-verilog -v
gives the correct output
fud e zbad.futil -s verilog.data z.data --to dat -s futil.flags "-d static-promotion -d cell-share " --through icarus-verilog -v
gives the incorrect output.
(And again, the only difference in the two files is the static<3>
annotation on the group).
@paili0628 any ideas on what could be going on?
So I think sync-dot-product-opt.futil
is currently broken because of a bug in the original code. The new StaticControl
changed the program's timing behavior, thereby exposing the bug. This probably deserves its own pr because I have now dedicated substantial time into fixing this without success. So let's skip this test for this pr and I will fix it in a subsequent pr.
Ah, thanks for digging into it! I agree with @paili0628 that we should disable this test for now (by renaming the corresponding .expect file to .skip) and merge this
Alright I'm gonna merge this for now. Thanks everyone!
Great! Please close related issues that have been addressed by this PR
This pr modifies the
infer-static-timing
pass to produce static groups for all groups whose latency can be inferred. All the changes are currently in thegroup-static-promotion
pass. I have not put the new pass into the pipeline in order not to mess things up.