~"Too much" elaboration of lists?~ How far can one go with Integers?

gameboo commented 3 years ago

Some of our CHERI Flute/Toooba CI jobs recently started failing. After some investigation, I decided that it was probably appropriate to open an issue here regarding what I observe. The TLDR:

$ bsc -sim -g top SmallTest.bsv && bsc -sim -e top
Error: "List.bs", line 334, column 17: (S0015)
  Bluespec evaluation-time error: tail
  During elaboration of the body of rule `smallTest' at "SmallTest.bsv", line
  62, column 8.
  During elaboration of `top' at "SmallTest.bsv", line 56, column 8.

where SmallTest.bsv is:

import List :: *;
//import Printf::*;

// ROT_BY_A FIRST_HOT_A : tail of empty list error
// ROT_BY_A FIRST_HOT_B : build ok
// ROT_BY_B FIRST_HOT_A : build ok
// ROT_BY_B FIRST_HOT_B : build ok

`define ROT_BY_A
`define FIRST_HOT_A

`ifdef ROT_BY_A
function List#(a) rotateBy(Integer n, List#(a) xs) =
  append(drop(n, xs), take(n, xs));
`else // ROT_BY_B
function List#(a) rotateBy(Integer n, List#(a) xs);
  for (Integer i = 0; i < n; i = i + 1) xs = rotate(xs);
  return xs;
endfunction
`endif

`ifdef FIRST_HOT_A
function Maybe#(List#(Bool)) firstHotToOneHot(List#(Bool) xs);
  List#(Bool) outList = Nil;
  Bool found = False;
  Integer n = length(xs);
  for (Integer i = 0; i < n; i = i + 1) begin
    Bool elem = head(xs);
    xs = tail(xs);
    outList = cons((!found) ? elem : False, outList);
    if (elem) found = True;
  end
  //return message(sprintf("%d", n), (found) ? Valid(reverse(outList)) : Invalid);
  return (found) ? Valid(reverse(outList)) : Invalid;
endfunction
`else // FIRST_HOT_B
function Maybe#(List#(Bool)) firstHotToOneHot(List#(Bool) xs);
  List#(Bool) outList = Nil;
  Bool found = False;
  Integer n = length(xs);
  for (Integer i = 0; i < n; i = i + 1) begin
    outList = cons((!found) ? xs[i] : False, outList);
    if (xs[i]) found = True;
  end
  //return message(sprintf("%d", n), (found) ? Valid(reverse(outList)) : Invalid);
  return (found) ? Valid(reverse(outList)) : Invalid;
endfunction
`endif

// Here I would normally do some arbitration and computing of next priority of
// requests. Note that the code here has been stripped of the various book
// keeping registers and actions, and distilled to the smallest numbers of
// calls that would still exhibit the pathological behaviour.
// It is not expected to actually produce anything semantically valuable and
// its sole purpose is to exhibit the suspicious list behaviour.
module top (Empty);

  Integer n = 10;
  List#(Bool) incomingReqs = replicate(n, ?);
  List#(Reg#(Bool)) lastReqPos <- replicateM(n, mkReg(False));
  lastReqPos[3] <- mkReg(True); // start with some arbitrary position
  rule smallTest;
    List#(Bool) lastPos = map(readReg, lastReqPos);
    Integer r = 0;
    // XXX
    for (Integer i = 0; i < n; i = i + 1) if (lastPos[i]) r = n - (i + 1);
    case (firstHotToOneHot(rotateBy(r, incomingReqs))) matches
    // XXX
      tagged Valid .elected: $display(fshow(elected));
      tagged Invalid: $display("should not happen with at least an active request");
    endcase
  endrule

endmodule

Note the commented out calls to message. When enabling those, it seams that the n after the loop is the expected 10 (or whatever list size chosen in the top module) when using `define FIRST_HOT_B, and rather surprisingly, double that size when using `define FIRST_HOT_A (really the only way to notice that with the output that message gives for `define FIRST_HOT_A was to try a few different list sizes). I certainly don't expect those loops to express anything meaningful for values of n greater than the expressed upper bound (and can well see how one would eventually come to evaluate the error thunk put into xs on the (expected) last iteration when using the FIRST_HOT_A approach).

EDIT: I added XXX markers in the code around the idiom (which I think of as meaningful, but can see how it would cause grief) pointed out by @mn416. I suppose I'd expect all combinations of the `defines above to fail if the intention was for such an idiom to be illegal.

The slightly less TLDR:

It looks like our CI builds ran happily when using bscs up to 46a605e but started dying with b16cabe. Since the CHERI Flute/Toooba builds are a bit hard to digest when it comes to identifying what exactly is going on, I checked if there was any smaller code base that would exhibit some similar issue.

Luckily, the library used by our CHERI enabled cores has a couple of small-ish examples: https://github.com/CTSRD-CHERI/BlueStuff/tree/c483104aa7f8bd2a1585e747dd6f5dab7d82e17f (note that this is a commit that still exhibits the issue, pointing at its submodule https://github.com/CTSRD-CHERI/BlueBasics/commits/0a310a33daed482bd1b19cb329c3711cdee2bb8b, and the more recent following commits in there are a case of "if we do it like this instead it seams to work"-fixes to get our larger builds going again).

Specifically, when building the "OneHotArbiter" example, the issue observed with recent bscs (which was not present with 46a605e and earlier) is:

$ make simExample-OneHotArbiter 
Makefile:63: .simExamples: No such file or directory
Makefile:64: .verilogExamples: No such file or directory
awk '/\<module/ {print "verilogExample-"$2".v: examples/VerilogExample-AXI-MasterSlave.bsv"; print "verilogExamples: " "verilogExample-"$2".v"}' examples/VerilogExample-AXI-MasterSlave.bsv > .gatherModules-VerilogExample-AXI-MasterSlave.bsv
cat .gatherModules-VerilogExample-AXI-MasterSlave.bsv > .verilogExamples
rm -f .simExamples
touch .simExamples
for f in examples/Example-AXI4-Deburst.bsv examples/Example-AXI4Lite-Bus.bsv examples/Example-TwoWayBus-SingleFlit.bsv examples/Example-OneHotArbiter.bsv examples/Example-AXI4-Bus.bsv examples/Example-TwoWayBus-MultiFlit.bsv examples/Example-OneWayBus-SingleFlit.bsv examples/Example-ToUnguarded.bsv examples/Example-OneWayBus-MultiFlit.bsv examples/Example-AXI4-MasterSlave-NoSynth-Synth.bsv examples/Example-AXI4-AXI4Lite-Bridges.bsv examples/Example-AXI4-Bus-Synth.bsv examples/Example-AXI4Lite-MasterSlave.bsv examples/Example-AXI4-toWiderSlave.bsv examples/Example-AXI4-toWiderMaster.bsv examples/Example-CharIO.bsv examples/Example-AXI4-MasterSlave.bsv examples/Example-AXI4-Bus-NoSynth-Synth.bsv; do tmp=`basename $f .bsv`; echo "simExamples: simExample-${tmp#"Example-"}" >> .simExamples; done
for f in examples/Example-AXI4-Deburst.bsv examples/Example-AXI4Lite-Bus.bsv examples/Example-TwoWayBus-SingleFlit.bsv examples/Example-OneHotArbiter.bsv examples/Example-AXI4-Bus.bsv examples/Example-TwoWayBus-MultiFlit.bsv examples/Example-OneWayBus-SingleFlit.bsv examples/Example-ToUnguarded.bsv examples/Example-OneWayBus-MultiFlit.bsv examples/Example-AXI4-MasterSlave-NoSynth-Synth.bsv examples/Example-AXI4-AXI4Lite-Bridges.bsv examples/Example-AXI4-Bus-Synth.bsv examples/Example-AXI4Lite-MasterSlave.bsv examples/Example-AXI4-toWiderSlave.bsv examples/Example-AXI4-toWiderMaster.bsv examples/Example-CharIO.bsv examples/Example-AXI4-MasterSlave.bsv examples/Example-AXI4-Bus-NoSynth-Synth.bsv; do tmp=`basename $f .bsv`; echo "simExample-${tmp#"Example-"}: $f" >> .simExamples; done
mkdir -p output/simExample-OneHotArbiter-info build/bdir build/simdir
bsc -cpp -Xcpp -I. -info-dir output/simExample-OneHotArbiter-info -simdir build/simdir -p +:./AXI:./BlueBasics:./BlueUtils -bdir build/bdir +RTS -K512M -RTS -show-schedule -sched-dot -show-range-conflict -sim -g top -u examples/Example-OneHotArbiter.bsv
checking package dependencies
compiling ./BlueBasics/ListExtra.bsv
compiling ./BlueBasics/SourceSink.bsv
compiling ./OneHotArbiter.bsv
compiling examples/Example-OneHotArbiter.bsv
code generation for top starts
Error: "List.bs", line 343, column 17: (S0015)
  Bluespec evaluation-time error: last
  During elaboration of the body of rule `arbitrate' at
  "examples/Example-OneHotArbiter.bsv", line 60, column 8.
  During elaboration of `top' at "examples/Example-OneHotArbiter.bsv", line
  35, column 8.
Makefile:67: recipe for target 'simExample-OneHotArbiter' failed
make: *** [simExample-OneHotArbiter] Error 1

I could not (yet?) come up with a more distilled example that would work with bsc 46a605e and break with b16cabe. The small example I did come up with yields the list error irrespective of bsc's version. The offending call on our full CHERI build and smaller library example is that of a last on an empty list as opposed to the tail I managed to keep for the reduced case, but since the suspicious bit here is not so much which specific list function is used, but rather how many times they get used / on which value they get used during elaboration, the hope is that the small example is still exhibiting the relevant issue.

My guess is that b16cabe would simply cause the idioms used for list manipulations in our library to be elaborated following a slightly different path through the compiler than it used to, but that this path which seams to be "elaborating too much" was actually there before? I still mention this commit in the hope that it could make it easier to identify the cause of this issue.

mn416 commented 3 years ago

Not sure if this is helpful, but having briefly looked at SmallTest.bsv, it looks like the problem is related to the Integer r (and hence the argument to take and drop) being dependent on circuit-time values. To remove this dependence, the compiler presumably needs to do some pretty serious partial evaluation. Another observation is that, because ROT_BY_A FIRST_HOT_B works, the issue is somehow related to the rebinding xs = tail(xs) in FIRST_HOT_A.

quark17 commented 3 years ago

I've asked @nanavati to also have a look at this. (I can't seem to assign the issue to him, for some reason.) He authored the commit that regressed things for you. It is part of a series of commits to improve optimizations for pack/unpack (of which the last few more commits were just merged). So I think it would be interesting to look at behaviors before that series and after that series (not just looking at that one commit and immediately before).

As an aside: Does your CI use the head of the BSC repo because that's your preference or because the BSC repo currently lacks a tagged version (and so the head is only place that's easy to grab)? If I tagged a version prior to that commit, would that help you (to be able to freeze on a particular BSC version)? (Of course, catching this regression has been helpful, so there would be benefits to testing from the head at least occasionally.)

I'm a little confused about what all is being reported here -- partly because the initial comment has been edited several times, so it's been different each time I've read it. In the future I recommend not editing messages, just add additional comments.

If I understand correctly: (1) the main issue is that a build in the BlueStuff repo started failing at BSC commit b16cabe, and (2) you also have written a small test, which doesn't change behavior at that commit (except slightly), but you think it still might be helpful -- because it illustrates several different ways to write the same function, some of which elaborate better than others, and one which fails to compile. Your reasoning is that: maybe what BSC changed was how code in BlueStuff elaborated; and maybe these different styles of writing the same function are illustrating the before and after elaborations.

Sadly, I don't think that's true -- I don't expect that SmallTest is helpful for debugging the change in BlueStuff. (It may still be useful as a separate example, of code that BSC ought to be improved to fix, and for regression testing.)

On (1), I confirm what you observed. If I checkout BlueStuff at the commit prior to your workaround (making sure to also checkout the submodule at the associated commit), and I run make simExample-OneHotArbiter, using BSC at commit 46a605e it builds and using BSC at the next commit (b16cabe) it fails to build, with an error about last in the List package.

Turning to SmallTest...

One thing I wanted to mentioned was this definition for the list that is being rotated etc:

List#(Bool) incomingReqs = replicate(n, ?);

The list is being populated with don't-care values (?), which I worry would cause a problem -- I'd think it would be better to get the values from a register or use constant values. However, the don't-care values don't seem to cause a problem, and changing to actual True/False values doesn't change the behavior of BSC on the test. So I guess it's OK.

At the top of SmallTest, you give this matrix of behaviors:

// ROT_BY_A FIRST_HOT_A : tail of empty list error
// ROT_BY_A FIRST_HOT_B : build ok
// ROT_BY_B FIRST_HOT_A : build ok
// ROT_BY_B FIRST_HOT_B : build ok

I wanted to see if I observed the same behavior, using the four different versions of BSC.

At commit e5af118 (which is before the series of pack/unpack commits), I observe:

ROT_BY_A FIRST_HOT_A : tail of empty list error
ROT_BY_A FIRST_HOT_B : build ok -- but takes a long time!
ROT_BY_B FIRST_HOT_A : build ok
ROT_BY_B FIRST_HOT_B : build ok

I think it's worth noting that case (A,B) is slightly different from the later two cases -- it builds, but it takes significantly longer.

At commit 46a605e (just prior to the commit that break things), I see the same behavior.

At commit b16cabe (the commit that breaks things), I see:

ROT_BY_A FIRST_HOT_A : tail of empty list error
ROT_BY_A FIRST_HOT_B : build ok -- if stack size is increased, takes a long time
ROT_BY_B FIRST_HOT_A : build ok
ROT_BY_B FIRST_HOT_B : build ok

It turns out that we do see a slight change in behavior! The (A,B) case fails to build with the default flags because the stack is exhausted. If I increase the stock (with, say +RTS -K100M), then it builds. So something about the elaboration of this case has gotten worse.

At commit c085ecd (after the series of pack/unpack commits), I see the same behavior.

Does this match what you were seeing?

There is, though, one other dimension to this matrix: the commented out message calls. When these are introduced, I observe that all four versions of BSC have the same behavior:

ROT_BY_A FIRST_HOT_A : error -- the length 'n' fails to reduce to a number
ROT_BY_A FIRST_HOT_B : error -- the length 'n' fails to reduce to a number
ROT_BY_B FIRST_HOT_A : build ok
ROT_BY_B FIRST_HOT_B : build ok

I further observe what you reported: the first two cases indicate a length of 20 (2n) while the last two cases indicate a length of 10 (n). For the last two cases, BSC completes successfully, so we see that 10 is printed by the message. In the first two cases, the argument to message does not reduce during elaboration (to a string that can be printed) so BSC reports an error. In that error, the un-reduced expression is dumped, and it is there that we see 20 (2n). What we see is a case-statement (expressed as an if-else tree) with arms that print various values up to 20. This matches what you observed, although I think you may have slightly misstated things:

Note the commented out calls to message. When enabling those, it seams that the n after the loop is the expected 10 (or whatever list size chosen in the top module) when using `define FIRST_HOT_B, and rather surprisingly, double that size when using `define FIRST_HOT_A (really the only way to notice that with the output that message gives for `define FIRST_HOT_A was to try a few different list sizes).

I observe that the choice of ROT_BY_A vs ROT_BY_B is what causes 2n vs n; the choice of FIRST_HOT_A vs FIRST_HOT_B does not -- I think you may have typed the wrong pair? This also matches what I expect from the code: the length displayed by message is the same in both firstHot functions; it's the length of the list built by rotateBy, so the choice of that function should be what matters.

Why do you see 2n? I believe it's because of this code:

function List#(a) rotateBy(Integer n, List#(a) xs) =
  append(drop(n, xs), take(n, xs));

This is appending two lists, one resulting from drop and one from take. When the argument n comes from a register, a function like drop becomes a case-statement for all the possibilities (from a 0-size list up to the original n-size list). Similarly, take could possibly return the entire n-size list. When we append the two, we get a new case-statement that expresses all the combinations, including the possibility of appending an n-size list with an n-size list (resulting in a 2n-size list). Of course, the conditions on those two situations cannot be true at the same time, so BSC should have pruned that arm of the combined case-statement. (And, in fact, BSC should have pruned all arms that don't result in a list of size n.)

This wasn't introduced with the recent commits. But it would be interesting to know why BSC hasn't pruned this code. BSC does have rules for pruning some expressions -- for example an if-else tree of conditions like x==c1 and x==c2 will be noticed and BSC will prune, knowing that x cannot equal two different constants. However, the ad hoc rules only recognize certain expressions; now that BSC has access to quick a SMT solver, it can use that to test for exclusivity of any expressions. (This is a known issue that was filed in Bluespec Inc's private database. I've been slowly transferring those issues here to GitHub. I haven't transfered this one yet, but I'll dig it up and file it next.) It's also possible that BSC just isn't looking for pruning opportunities in certain places (regardless of whether it's ad hoc rules or using the SMT solver), so that also could be looked at.

(Oddly, the expression that is dumped for the un-reduced argument to message is not a case-statement with arms for all possible combinations from 0 to 20. It seems to only have arms for 1, 3, 4, 7, 8, 9, 10, 11, 12, 15, 16, 19, and 20. I'm not sure why some arms were pruned, but others weren't, or if there's any significance to these numbers.)

quark17 commented 3 years ago

I've boiled down the issue from BlueStuff into this self-contained one-page file: Test.bsv

$ bsc -verilog Test.bsv
Error: "List.bs", line 343, column 17: (S0015)
  Bluespec evaluation-time error: last
  During elaboration of the body of rule `arbitrate' at "Test.bsv", line 45,
  column 8.
  During elaboration of `mkTest' at "Test.bsv", line 38, column 8.

This maintains the arbitration function, while removing everything else. It's possible that code within the function could be further removed (while still triggering the error), if that would help debugging.

gameboo commented 3 years ago

Hi @quark17, thanks a lot for having a look at this!

As an aside: Does your CI use the head of the BSC repo because that's your preference or because the BSC repo currently lacks a tagged version (and so the head is only place that's easy to grab)? If I tagged a version prior to that commit, would that help you (to be able to freeze on a particular BSC version)? (Of course, catching this regression has been helpful, so there would be benefits to testing from the head at least occasionally.)

I suppose in general we don't have a strong reason to track bsc's HEAD, but we do care about some specific features on occasion (at the moment, we would be keen to benefit from recent enum related improvements that @jrtc27 pointed out would help Toooba's area. @jrtc27 please correct me if I misunderstood this).

I'm a little confused about what all is being reported here -- partly because the initial comment has been edited several times, so it's been different each time I've read it. In the future I recommend not editing messages, just add additional comments.

Apologies, I was hoping that adding a mention to what @mn416 pointed out directly next to the relevant code would be helpful and tried to make the edited section of the comment stand out by labelling it as such... Only to end up in a short burst of oversight/typo fixes in the edit... I suppose it served the opposite purpose, that wasn't my intention.

If I understand correctly: (1) the main issue is that a build in the BlueStuff repo started failing at BSC commit b16cabe, and (2) you also have written a small test, which doesn't change behavior at that commit (except slightly), but you think it still might be helpful -- because it illustrates several different ways to write the same function, some of which elaborate better than others, and one which fails to compile.

yes

Sadly, I don't think that's true -- I don't expect that SmallTest is helpful for debugging the change in BlueStuff. (It may still be useful as a separate example, of code that BSC ought to be improved to fix, and for regression testing.)

I am not sure which BlueStuff change you are referring to here. When I said

(note that this is a commit that still exhibits the issue, pointing at its submodule https://github.com/CTSRD-CHERI/BlueBasics/commits/0a310a33daed482bd1b19cb329c3711cdee2bb8b, and the more recent following commits in there are a case of "if we do it like this instead it seams to work"-fixes to get our larger builds going again).

I was expressing that we had "debugged" it in later commits. Specifically, the relevant change is in https://github.com/CTSRD-CHERI/BlueBasics/commit/05572fcbd0ce428dcb7d087c76f3d9d2dbd8ca38 (the very next commit).

The intention with SmallTest was to avoid the need for people who wanted to help to have to dig intoBlueStuff/BlueBasics commits. This https://github.com/CTSRD-CHERI/BlueBasics/commit/05572fcbd0ce428dcb7d087c76f3d9d2dbd8ca38 change really is the one I had hoped could be reverted if bsc could compile SmallTest. The last error I mentioned was basically what motivated this change to stop using rotateR which seamed to be the only function that was using last internally (at least when looking into bsc/src/Libraries/Base1/List.bs).

Your new Test.bsv example is much nicer in that it gets "fixed" by applying exactly that patch (at least when I try), and does get triggered by the exact same bsc commit as the one that triggered the CI fail.

But as you say, even if this happens to be orthogonal, I am interested in the SmallTest issue in isolation from the whole CHERI / BlueStuff setup, and glad if it could be of use from a regression testing perspective.

The list is being populated with don't-care values (?), which I worry would cause a problem -- I'd think it would be better to get the values from a register or use constant values. However, the don't-care values don't seem to cause a problem, and changing to actual True/False values doesn't change the behavior of BSC on the test. So I guess it's OK.

Yes, this is a good point, maybe for a proper regression test, it would be better to follow that guidance. I only left it like this here as it appeared sufficient to trigger the odd behaviour.

It turns out that we do see a slight change in behavior! The (A,B) case fails to build with the default flags because the stack is exhausted. If I increase the stock (with, say +RTS -K100M), then it builds. So something about the elaboration of this case has gotten worse.

At commit c085ecd (after the series of pack/unpack commits), I see the same behavior.

Does this match what you were seeing?

Yes that is correct! All I can think is that I must have had some conf local to the machine I had first tested this on that enabled a deeper stack by default as I only just observed the running out of stack effect when re-trying this now. I also did try it with c085ecd and can confirm I still observe this behaviour.

What we see is a case-statement (expressed as an if-else tree) with arms that print various values up to 20. This matches what you observed, although I think you may have slightly misstated things:

Yes! Thank you for clarifying this. I think this output now makes a bit more sense.

I observe that the choice of ROT_BY_A vs ROT_BY_B is what causes 2n vs n; the choice of FIRST_HOT_A vs FIRST_HOT_B does not -- I think you may have typed the wrong pair?

Absolutely :)

This is appending two lists, one resulting from drop and one from take [...]

This is very interesting and makes a lot of sense. Thanks for sharing this insight.

(Oddly, the expression that is dumped for the un-reduced argument to message is not a case-statement with arms for all possible combinations from 0 to 20. It seems to only have arms for 1, 3, 4, 7, 8, 9, 10, 11, 12, 15, 16, 19, and 20. I'm not sure why some arms were pruned, but others weren't, or if there's any significance to these numbers.)

All I can think of that would have any impact on this is the arbitrary starting position lastReqPos[3] <- mkReg(True)... But that hardly explains the observed pattern...

I've been slowly transferring those issues here to GitHub. I haven't transferred this one yet, but I'll dig it up and file it next

Thank you again for looking into this.

All in all, I think our repo with https://github.com/CTSRD-CHERI/BlueBasics/commit/05572fcbd0ce428dcb7d087c76f3d9d2dbd8ca38 is currently building fine. I am curious to know your thoughts on that fix, irrespective of whether it can be temporary or not.

Thanks again.

quark17 commented 3 years ago

Your responses clarify a lot. Thank you!

B-Lang-org / bsc

~"Too much" elaboration of lists?~ How far can one go with Integers? #319