`-dparsed` output produces invalid register updates with BSV

RyanGlScott commented 10 months ago

Given this BSV code:

package Foo;

import Vector::*;

module helloWorld#(Vector#(2, Reg#(UInt#(32))) v, Reg#(Bit#(64)) idx)(Empty);
   rule hello_world;
     v[idx] <= 0;
   endrule
endmodule

endpackage

Compiling it with bsc -dparsed=FooParsed.bsv Foo.bsv yields:

package Foo;
import Vector::*;

module helloWorld#(Vector#(2, Reg#(UInt#(32))) v, Reg#(Bit#(64)) idx)(Empty);
  rule hello_world (True);
     (v[idx] <= 0);
  endrule: hello_world
endmodule: helloWorld

endpackage: Foo

This is almost the same code, but there is one key difference: the (v[idx] <= 0); statement has surrounding parentheses. This change is significant enough to cause bsc to reject it:

$ bsc FooParsed.bsv 
Warning: Unknown position: (S0001)
  File name `FooParsed.bsv' does not match package name `Foo'
Error: "FooParsed.bsv", line 6, column 14: (T0080)
  Type error at the use of the following function:
    (<=)

  The expected return type of the function:
    Action

  The return type according to the use:
    Bool

Removing the parentheses causes bsc to accept the code.

The culprit is likely the CSExpr _ e case in this code:

https://github.com/B-Lang-org/bsc/blob/f00d205867eefe09c60e11b4df155bb87041799a/src/comp/CVPrint.hs#L698-L699

This causes the precedence to be p+2 when pretty-printing the expression v[idx] <= 0, which is printed using this code:

https://github.com/B-Lang-org/bsc/blob/f00d205867eefe09c60e11b4df155bb87041799a/src/comp/CVPrint.hs#L572

Because the precedence is greater than 0, this causes the overall expression to be parenthesized by pparen.

What is interesting is that there is a separate CSExpr _ e@(Ccase _ _ _) that calls pvPrint at precedence p instead of p+2. It's unclear to me why this difference exists, but it is likely that if the CSExpr _ e case changed its precedence to p, then this issue would be resolved. It's unclear to me if this would have ramifications when pretty-printing other forms of CSExprs, however.

quark17 commented 10 months ago

As noted on issue 663, there are likely to be many bugs in the pretty-printing, but thank you again for reporting.

This a problem caused by the fact that <= has two meanings in BSV, depending on whether it's an action or expression context. The parentheses indicate an expression context, and one way to fix this is to remove the parentheses as you mention. But note that there is another way to fix it. An expression can be of type Action, and there is a syntax for creating action expressions -- action..endaction -- so this can be inserted inside the parentheses:

(action v[idx] <= 0; endaction);

But it would be preferable to print without the unnecessary double wrappers.

This might suggest that the pretty-printing routines need to keep track of the context: the printing of Cwrite would add action..endaction if the context is expression not action; and the printing of CSExpr _ e would specify action context when printing e. However, my suspicion is that action...endaction in the original BSV would show up as an explicit structure in the parsed CSyntax, and so the pretty-printer could just naively print what it sees, because the action..endaction is in the code. And I confirm that this is true, by looking at the output of -dparsed with action..endaction inside the parentheses. On the other hand, the parentheses are not an explicit construct in the parsed CSyntax.

In any case, it seems like an error for the pretty-printing of Cwrite to insert parentheses for precedence; it should print action..endaction. So maybe the simplest fix is to change that from using pparen (which prints (..) on True) to some other function that prints action..endaction on True. This would still have an unnecessary level of wrapper, but it would at least parse correctly. Although when I tested the following, the parser didn't like the semicolon after the action block:

rule hello_world (True);
  action v[idx] <= 0; endaction;
  $display("hello");
endrule: hello_world

If I remove the semicolon, it parses, oddly!

rule hello_world (True);
  action v[idx] <= 0; endaction
  $display("hello");
endrule: hello_world

So maybe instead of pparen, it needs to print (action..endaction), with the parens, at least until it's determined that the parser is wrong and is fixed.

Eliminating the extra level of wrapper might still be good to do, and removing the p+2 might do it (I haven't looked closely into it though). As for this:

It's unclear to me if this would have ramifications when pretty-printing other forms of CSExprs, however.

One thing to do is to check whether it makes a difference to the testsuite. Specifically the bsc.syntax pretty-printer tests, which would be explicitly testing the output. (The rest of the testsuite may not be checking the exact text of messages, so you might not see a change in testsuite results even if there's a user-visible difference.) I would guess that there's very few pretty-printer tests, though, and that we really need to add more.

One way to check would be to run a script on all .bsv files in the testsuite, to check whether the -dparsed output compiles (and maybe whether the -dparsed output the second time is identical? for lack of a good way to check if the output is the same as the original). Note that you can use -KILLparsed alongside -dparsed, to make BSC exit after dumping that stage.

I do have access to the commit logs for BSC before going open source, so I'm able to see when those two lines for CSExpr were added, in case that sheds any light. These lines date back to 2004, when the BSV language was being created. There was a "demo" (as it was called in the logs and emails of the time) that was being developed, and the BSV parser and pretty-printer were being adjusted based on the source code of that demo. I believe that "demo" was Pong, which existed in Classic and was being translated to BSV (as BSV was being created, to support that translation). In April 2004, the one line for printing CSExpr (with p+1) was split into the two that we see now (with a line for "case" with no change to p and everything else with p+2 -- I have no idea why that increased, or even how an increment of more than 1 even makes a difference). The commit seems to be fixing up the handling of case statements in various places; the pprint of Ccase adds a call to pparen around it -- to support case expressions, not just case statements) and presumably the special arm for CSExpr (Ccase) is to prevent the unnecessary extra wrapper parens that would result. Not just unnecessary actually ... it would change the meaning from a case statement to a case expression. This maybe also argues again for a pretty printer that keeps track of the context (statement/action vs expr), so it knows that no parens are needed for Ccase in a statement context.

It also suggests another fix: In the same way that Ccase is specially handled in CSExpr, you could add a special arm for Cwrite that also knows not to call it with increased parens.

Although, actually, why is p being passed down -- that's the precedence of the parent construct's context. Whether a do or module or action block needs parentheses around the whole thing should have no effect on the need for parens around individual statements. The sub calls to pprint should pass 0 or 1 and not p and p+1, right? (Unless I'm misunderstanding something -- as clearly I am, if 2 has some kind of meaning.)

Anyway, that's a lot of rambling. Does any of that help? I guess I'm saying that I don't currently recall how all this works, without spending time reading the code, and I don't have time for that. But feel free to spend time on that and submit PRs if you'd like. And tests are important, and we lack them, so feel free to run a script over all the source files in the testsuite as a manual test; and feel free to add any particularly interesting or broken ones as a automatic test in the testsuite. And in particular, the Pong example in BSV may have been the guiding example for some of what you're seeing.

The Pong example is in testsuite/bsc.bsv_examples/pong/. I think the original Classic code is in bsc.bsc_examples/pong/ (note bsc_ instead of bsv_). So if you want to test what effect a change has, you can try it first on the BSV Pong. And then I'd suggest trying it on all the files in bsc.bsv_examples. I think that directory was mostly created around 2004, when BSV syntax was being developed. (Some that were added later, as late as 2013, are: the SHA directories, PAClib, mimo, FloatTest, FP, GlibcRandom. Another early example alongside Pong is the "mesa" example.)

RyanGlScott commented 10 months ago

Thanks for the detailed explanation!

For what it's worth, I found a workaround for this issue as a client who is generating code using the bsc API: use CBinOp ... idAssign ... instead of Cwrite to represent register updates. The bug in https://github.com/B-Lang-org/bsc/issues/662#issue-2080558250 specifically affects Cwrite, but CBinOp ... idAssign ... isn't affected. Moreover, there is code elsewhere in the pretty-printer for printing idAssign as <= here, so CBinOp ... idAssign ... works regardless of whether you are printing the AST as Bluespec Haskell or as Bluespec SystemVerilog.

In short, it would be nice to fix this issue, but it's not as urgent for my needs as I originally thought it to be.

B-Lang-org / bsc

`-dparsed` output produces invalid register updates with BSV #662