Failing continuous assignment simulation

yn224 commented 2 years ago

From the issue #540, there were couple test cases that were found to be not functional when brought down to a lowered representation. Further analysis needs to be conducted.

[x] Iter_mult
[x] While

sampsyo commented 2 years ago

Awesome! Happy to help work through these to help identify the underlying issue if you want!

yn224 commented 2 years ago

Recap of the Issue

Previous to the most-recent stack environment and pass updates, the lowered (or "fully continuous") versions of while and iter_mult test cases were failing since they were entering an infinite loop. In attempts to debug this infinite-loop issue, @EclecticGriffin and I found that the comparator in Value did not work as we had expected since the bit-vectors were defined such that the LSB is indexed by 0 and using a BitVec comparator yielded an incorrect comparison result. This issue was fixed by instead using a u64 comparator (part of PR #575). Yet, this did not resolve their correctness since we were unable to get the output. Further debugging led to the conclusion that when the interpreter fails, the changes in values associated with certain components had incorrect values. Now, with the most recent PRs that got merged, while was able to fix itself but not iter_mult.

The improved design of interp_assignments contained a feature where the interpreter will panic when multiple assignments are detected in the same cycle. Using this, I was able to pinpoint which two assignments were conflicting.

How to Reproduce Error

To simplify the lowered output, I reduced the test case to the point where the error can still get reproduced. The following is the simplest version of the iter_mult control that will result in the panic:

control {
  while lt.out with cond_while { //just 4 iterations (i starts as 0)
    seq {
      if slicer.out with cond_if { //there is a 1 in LSB of Q
        seq {}
      } else { //there is a 0 in LSB of Q
        seq {}
      }
      incr_while;
    }
  }
}

I have taken out the overarching seq and the init block, as well as all the bodies of if-else branch.

I then lowered this file and saved the output to run cargo run to make the program panic. After printing out all the detailed output of assignments that interp_assignments goes through, I came to the conclusion that lines 35 and 38 (line numbers may be different from mine) caused a multiple-assignment conflict, causing the interpreter to panic.

...
Line:33 fsm.in = fsm.out == 3'd4 & cs_wh.out & go | fsm.out == 3'd5 ? 3'd0;
Line:34 fsm.in = fsm.out == 3'd0 & 1'b1 & go ? 3'd1;
Line:35 fsm.in = fsm.out == 3'd1 & 1'b1 & go ? 3'd2;
Line:36 fsm.in = fsm.out == 3'd2 & cs_if.out & cs_wh.out & go | fsm.out == 3'd2 & !cs_if.out & cs_wh.out & go ? 3'd3;
Line:37 fsm.in = fsm.out == 3'd3 & i.done & cs_wh.out & go ? 3'd4;
Line:38 fsm.in = fsm.out == 3'd1 & !cs_wh.out & go ? 3'd5;
...

When looked closely, the two assignments are not mutually exclusive. Had line 35 included a guard cs_wh.out to make itself mutually exclusive from line 38, the program will not panic at least.

The resolution still remains valid for the original lowered-code (i.e., lowered version of the current iter_mult itself):

...
Line:57 fsm.in = fsm.out == 4'd11 ? 4'd0;
Line:58 fsm.in = fsm.out == 4'd9 & i.done & cs_wh.out & go ? 4'd10;
Line:59 fsm.in = fsm.out == 4'd2 & !cs_wh.out & go ? 4'd11;
Line:60 fsm.in = fsm.out == 4'd0 & m.done & q.done & i.done & a.done & c.done & go | fsm.out == 4'd10 & cs_wh.out & go ? 4'd1;
Line:61 fsm.in = fsm.out == 4'd1 & 1'b1 & go ? 4'd2;
Line:62 fsm.in = fsm.out == 4'd2 & 1'b1 & go ? 4'd3;
Line:63 fsm.in = fsm.out == 4'd3 & c.done & cs_if.out & cs_wh.out & go | fsm.out == 4'd3 & q.done & !cs_if.out & cs_wh.out & go ? 4'd4;
Line:64 fsm.in = fsm.out == 4'd4 & a.done & cs_if.out & cs_wh.out & go | fsm.out == 4'd4 & a.done & !cs_if.out & cs_wh.out & go ? 4'd5;
Line:65 fsm.in = fsm.out == 4'd5 & q.done & cs_if.out & cs_wh.out & go | fsm.out == 4'd5 & c.done & !cs_if.out & cs_wh.out & go ? 4'd6;
Line:66 fsm.in = fsm.out == 4'd6 & a.done & cs_if.out & cs_wh.out & go ? 4'd7;
Line:67 fsm.in = fsm.out == 4'd7 & c.done & cs_if.out & cs_wh.out & go ? 4'd8;
Line:68 fsm.in = fsm.out == 4'd8 & cs_if.out & cs_wh.out & go | fsm.out == 4'd6 & !cs_if.out & cs_wh.out & go ? 4'd9;
...

Here, the issue occurs between assignments in line 59 and line 62. When I manually added the guard cs_wh.out to line 62, the output came out to be correct.

Further Work

To make the panic more descriptive, I will add the actual conflicting assignments as part of the panic message. Also, I will start looking into different passes that could have generated this "wrong" lowered code to fix this mutual-exclusion error.

sampsyo commented 2 years ago

Awesome; thanks for the detailed recap, @yn224!! This is really really good. Just for the lazy people like me, any chance you could attach (or Gist) a complete, compilable version of the reduced file? (And maybe the exact command to run to reproduce the problem?)

yn224 commented 2 years ago

Awesome; thanks for the detailed recap, @yn224!! This is really really good. Just for the lazy people like me, any chance you could attach (or Gist) a complete, compilable version of the reduced file? (And maybe the exact command to run to reproduce the problem?)

Sure thing. One can have this file (converted to .futil) and run cargo run -- <directory>/imult_lowered_simpl.futil under the interp directory to check the output. This will panic if cs_wh.out mentioned above is not included. imult_lowered_simpl.txt

Here is the complete lowered version: imult_lowered_complete.txt For this, you should be able to get a correct output once cs_wh.out gets added in line 62.

sampsyo commented 2 years ago

Thanks, @yn224. Here's a slightly more self-contained case for reproducing the problem. It's the original, high-level iter_mult.futil with your control block swapped in and dead cells/groups removed: https://gist.github.com/sampsyo/8e78238dec08e0210d9ea48dffd829c1

To reproduce the problem, I used interpretation via fud to make things easy. First, to demonstrate that high-level (non-lowered) interpretation works, try:

$ fud e --to interpreter-out reduced_iter_mult.futil

Then, to show that interpreting the lowered Calyx code crashes, use:

$ fud e --to interpreter-out -s futil.flags '-p all' reduced_iter_mult.futil
✖ interpreter → interpreter-out: interpret
`./target/debug/interp  -l /Users/asampson/Documents/cu/research/futil  /var/folders/dd/z1mcgf915pd1hc83xp51d1rw0000gn/T/tmpg9dlsce_' failed:
=====STDERR=====
thread 'main' panicked at '[interpret_group]: multiple assignments to one port: fsm1.in', interp/src/interpreter/interpret_group.rs:223:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So, that's exciting! And if you ever want to look at the (still very long) lowered code with your eyes, you can always use fud e --to futil-lowered.

sampsyo commented 2 years ago

I was able to further reduce the test to just a while loop. It turns out that the if in the loop body was not necessary, and in fact running the while loop at all was not necessary to reproduce the bug. (It happens when the loop exits, so we can just make a loop that runs zero times to hit that "exit" condition immediately.) Here's the complete reduced Calyx program:

import "primitives/core.futil";

component main() -> () {
  cells {
    zero = std_const(1, 0);
  }

  wires {
    group cond_while {
      cond_while[done] = 1'b1;
    }
  }

  control {
    while zero.out with cond_while {
      seq {}
    }
  }
}

It has the same behavior as above: everything works fine if interpreted directly, but we get the "multiple assignments" error when interpreting the lowered version.

This smaller test case makes it easier to see the conflicting assignments in the lowered Calyx code. They are:

fsm.in = fsm.out == 2'd2 & go | fsm.out == 2'd1 & !cond_stored.out ? 2'd0;
fsm.in = fsm.out != 2'd2 & go ? incr.out;

Operator precedence makes those predicates kinda hard to read, but you can see how there are conflicting assignments there when the FSM is in state 1 (and cond_stored is false, which is always the case). Now the work begins to narrow down which pass is generating this code.

sampsyo commented 2 years ago

The problem is somewhat easier to see if we just run the first couple of compiler phases. Taking my reduced test case above and just doing pre-optimization and compilation, like so:

$ fud e --to futil-lowered -s futil.flags '-p pre-opt -p compile' reduced_iter_mult.futil

Has a wires section that looks roughly like this (in heavily redacted form):

group static_while {
  ...
  fsm.in = fsm.out != 2'd2 ? incr.out;
  ...
  fsm.in = fsm.out == 2'd2 ? 2'd0;
  ...
}
fsm.in = fsm.out == 2'd1 & !cond_stored.out ? 2'd0;

In my reading, here's what the static control compilation pass has generated:

Inside the static_while group, there is the "normal" loop control. When we're at the end of the loop (state 2), go back to state 0. When we're in any other state, increment the state number using incr.
Outside the group, as a continuous assignment, check for the loop exit condition in state 1.

I don't quite understand why one of the above is part of the group and the other is a continuous assignment, but that's probably not too important here. What is important is that it's easy to see why these assignments conflict! In state 1, the stuff inside static_while will try to increment the counter (setting fsm.in = 2) while, at the same time, the continuous assignment that's outside there will be attempting to exit the loop (setting fsm.in = 0).

So I guess my questions are, for people more familiar with the static control compilation pass:

This is wrong, isn't it? Shouldn't the incr.out assignment to fsm.in be more restrictive to make it mutually exclusive with the loop-exit assignment?
If so, why does this work in Verilog execution?

rachitnigam commented 2 years ago

My guess for why the generated verilog code works is because we generate priority logic and the continuous assignment at the bottom is never selected.

We really need to start verifying that assignments are truly disjoint. And replace the stupid static timing pass with the top down static timing pass.

rachitnigam commented 2 years ago

Also, to respond to Adrian’s question, yes, the assignment should also use the value of cond_stored in the first assignment. Unfortunately, I am without a computer and unable to fix it. @EclecticGriffin wanna take a shot?

EclecticGriffin commented 2 years ago

Sure, happy to take a stab at it

sampsyo commented 2 years ago

Awesome; I made a separate bug with the self-contained repro stuff in #608.

My guess for why the generated verilog code works is because we generate priority logic and the continuous assignment at the bottom is never selected.

Good point. That sounds right to me!

We really need to start verifying that assignments are truly disjoint. And replace the stupid static timing pass with the top down static timing pass.

The latter sounds good! But the former seems to require a SAT solver, right? It sorta seems like the interpreter is "working as intended" in this case as a way to catch these things—dynamically, not statically, but even so it helpfully led us right to the conflicting pair of assignments and where they were generated.

rachitnigam commented 2 years ago

Verilator also has dynamic checking for if statements marked with “unique”

sampsyo commented 2 years ago

@EclecticGriffin helpfully pointed out on Slack that I may have found yet another, different bug on the wrong path here. Namely, what @yn224 was looking at was without static timing enabled. And indeed, it's possible to reproduce the multiple-assignments panic without static timing.

Starting from my gist above that gave a complete Calyx file using Alex's reduced control, you can see that it still fails with infer-static-timing disabled by running this:

$ fud e --to interpreter-out -s futil.flags '-p all -d infer-static-timing' foo.futil

So I started afresh with the ol' test-case reduction game, and I've ended up with this further-reduced test case:

import "primitives/core.futil";

component main() -> () {
  cells {
    zero = std_const(1, 0);
  }

  wires {
    group cond_while {
      cond_while[done] = 1'b1;
    }

    group cond_if {
      cond_if[done] = 1'b1;
    }
  }

  control {
    while zero.out with cond_while {
        if zero.out with cond_if {
          seq {}
        } else {
          seq {}
        }
    }
  }
}

It turns out that the if does seem to be necessary in this dynamically-timed scenario, but the loop iteration again does not (just using a constant zero to immediately terminate the if suffices to crash).

I'll leave this one to @yn224 to keep investigating, but hopefully this helps you get a little closer to crystallizing where the problem arises from. As a next step, @yn224, can you please make it a goal to file a separate bug along the lines of #608 that makes it clear where the "blame" lies (on a particular pass, for example) for generating conflicting assignments? Then you can get the team's opinion on whether the diagnosis looks correct, which should make it possible to identify the fix.

yn224 commented 2 years ago

Moving to #616

calyxir / calyx