B-Lang-org / bsc

Bluespec Compiler (BSC)
Other
952 stars 146 forks source link

Pretty-printing bug involving `let` and `letseq` #529

Closed RyanGlScott closed 1 year ago

RyanGlScott commented 1 year ago

I recently noticed a bug in the way bsc pretty-prints let and letseq statements. The easiest way to observe the bug is by calling the bsv2bsc utility on this program:

module helloWorld (Empty);
   rule hello_world;
      let hw = "Hello, World!";
      $display (hw);
      $finish;
   endrule
endmodule

Running bsv2bsc on this program will generate:

package HelloWorld where {
helloWorld :: (Prelude.IsModule _m__ _c__) => _m__ Empty;
helloWorld =
    module {
      Prelude.addRules
        (rules {

           "hello_world":  when Prelude.True
                            ==>
                              action { letseq hw =  "Hello, World!";; $display hw; $finish; }
         });
      Prelude.return
        (interface Empty {
         })
    };
}

The letseq hw = "Hello, World!";; bit is not valid Bluespec Haskell. If you try compiling this, it will fail with:

$ bsc -sim -g helloWorld HelloWorld.bs
Error: "HelloWorld.bs", line 10, column 80: (P0005)
  Unexpected ";"; expected "@", "<var>", "(", "<con>", "_", "<real>",
  "<integer>", "when", or "="

Why are there two semicolons after the letseq statement? The PPrint CExpr instance separates each statement in a module with a semicolon (see here), and the PPrint CDefl instance also separates each clause in in a letseq statement with a semicolon (see here).

One way to fix the issue is the add intervening braces, like so:

diff --git a/src/comp/CSyntax.hs b/src/comp/CSyntax.hs
index a0580f5f..c22884cd 100644
--- a/src/comp/CSyntax.hs
+++ b/src/comp/CSyntax.hs
@@ -1151,9 +1151,9 @@ instance PPrint CStmt where
             (map (ppPProp d . snd) pprops) ++
             [pp d pat <+> t "<-" <+> pp d e]
     pPrint d p (CSletseq []) = internalError "CSyntax.PPrint(CStmt): CSletseq []"
-    pPrint d p (CSletseq ds) = text "letseq" <+> foldr1 ($+$) (map (pp d) ds)
+    pPrint d p (CSletseq ds) = text "letseq" <+> text "{" <+> foldr1 ($+$) (map (pp d) ds) <+> text "}"
     pPrint d p (CSletrec []) = internalError "CSyntax.PPrint(CStmt): CSletrec []"
-    pPrint d p (CSletrec ds) = text "let" <+> foldr1 ($+$) (map (pp d) ds)
+    pPrint d p (CSletrec ds) = text "let" <+> text "{" <+> foldr1 ($+$) (map (pp d) ds) <+> text "}"
     pPrint d p (CSExpr _ e) = pPrint d p e

 instance PPrint CMStmt where

With this patch applied, bsv2bsc generates this code instead:

package HelloWorld where {
helloWorld :: (Prelude.IsModule _m__ _c__) => _m__ Empty;
helloWorld =
    module {
      Prelude.addRules
        (rules {

           "hello_world":  when Prelude.True
                            ==>
                              action { letseq { hw =  "Hello, World!"; }; $display hw; $finish; }
         });
      Prelude.return
        (interface Empty {
         })
    };
}

This time, letseq { hw = "Hello, World!"; }; is valid BH.

There is a similar bug involving let/letseq expressions (as opposed to let/letseq statements). It can be fixed in a similar manner:

diff --git a/src/comp/CSyntax.hs b/src/comp/CSyntax.hs
index a0580f5f..281cf969 100644
--- a/src/comp/CSyntax.hs
+++ b/src/comp/CSyntax.hs
@@ -1044,12 +1044,12 @@ instance PPrint CExpr where
     pPrint d p (Cletseq [] e) = pparen (p > 0) $
         (t"letseq in" <+> pp d e)
     pPrint d p (Cletseq ds e) = pparen (p > 0) $
-        (t"letseq" <+> foldr1 ($+$) (map (pp d) ds)) $+$
+        (t"letseq" <+> text "{" <+> foldr1 ($+$) (map (pp d) ds)) <+> text "}" $+$
         (t"in  " <> pp d e)
     pPrint d p (Cletrec [] e) = pparen (p > 0) $
         (t"let in" <+> pp d e)
     pPrint d p (Cletrec ds e) = pparen (p > 0) $
-        (t"let" <+> foldr1 ($+$) (map (pp d) ds)) $+$
+        (t"let" <+> text "{" <+> foldr1 ($+$) (map (pp d) ds)) <+> text "}" $+$
         (t"in  " <> pp d e)
     pPrint d p (CSelect e i) = pparen (p > (maxPrec+2)) $ pPrint d (maxPrec+2) e <> t"." <> ppVarId d i
     pPrint d p (CCon i []) = ppConId d i
RyanGlScott commented 1 year ago

I'd be happy to submit a fix for this issue, but I'm not sure of the best way to add a test case. One way would be to add a test that calls bsv2bsc on the test case above and ensure that the resulting .bs file compiles. However, that would require adding some infrastructure to the test suite to support invoking bsv2bsc. (There is similar infrastructure for invoking the related bsc2bsv utility, so perhaps that infrastructure could be adapted.)

An alternative approach would be to invoke the bsc Haskell API directly, but I have no idea if the test suite is set up to support that.

RyanGlScott commented 1 year ago

Another hiccup: the existing run_bsc2bsv function saves its output to a temporary file that is immediately thrown away afterwards, making it impossible to compile the output using compile_pass in an .exp file. Perhaps we need something like compile_{bsc2bsv,bsv2bsc}_pass?

RyanGlScott commented 1 year ago

I take back my comment in https://github.com/B-Lang-org/bsc/issues/529#issuecomment-1372915735. It's not that the output is being written to a temporary file, but rather that the test case wasn't running at all, since I wasn't setting the DO_INTERNAL_CHECKS environment variable. After setting that, I was able to get something working without too much trouble. I'll submit a patch soon.

Along the way, I noticed a couple of things:

quark17 commented 1 year ago

Hi Ryan, thank you for reporting this and for all the work on the issue! I need to take time time to look at the PR, but I should be able to accept it soon. Thank you!

I can give some context first, though. In 2003/2004, Bluespec Inc was created to take the BSC technology (that existed previously at Sandburst Corp) and make it into a commercially available tool. The only syntax accepted by BSC was BH, and it was decided that this was too foreign for hardware engineers, so a new syntax was needed. At the same time, SystemVerilog was being developed and heralded, so it was decided to embed rules in that language. A first step towards that was just to create a parser for a SV-like syntax, BSV. Ultimately, no further step was taken.

Anyway, at that time (2004), everything was written in BH and it was necessary to start converting examples into BSV. So a quick tool was cobbled together (bsc2bsv), which called the BH parser to read in a file and then called the BSV pretty printer to display it. This program was not perfect -- often the output wouldn't compile, and even when it did, it was often verbose and not the style one would use for writing by hand. (Also, it uses an old style of module instantiation that was attempting to be SV-like, but nowadays no one writes instantiations like that and instead just uses the one-line version with left arrow.) Anyway, the program didn't need to be perfect, it just needed to do 90% percent of the work of translating, and a human would do the rest of the work (and clean it up).

The tool was never released to the public (it wasn't in the BSC release tar-files) and it was mostly never used after 2004. The reverse program, bsv2bsc, was created at the time, just because it was easy to do -- there wasn't a need for it in 2004, and I'm sure it was used even less. Though in 2017-ish, BH was revived for some users who were familiar with Haskell, and training material was converted from BH to BSV, so it's possible that the tool got used then -- but, again, it would only have needed to do 90% of the work, as a human was going to edit the results anyway.

So it was never expected to work fully, has hardly been used, and was never released. So it's no surprise that it has issues.

These tools were included in the open BSC repo, in case they were useful. Because they're not tested and potentially buggy, they aren't put into the installation files by default and aren't in the releases that we build. We could change that decision though, and reclassify it from "internal tools" to first-class tools, and have it tested properly in the testsuite and included in builds and releases.

We probably also would want to rename them to be bh2bsv and bsv2bh, using the current naming convention for the languages; or at least bs2bsv and bsv2bs using the file extension convention. (I would guess that the 'c' stands for Classic in 'bsc', but it's too easily confused with the name of the compiler.) We may even want to consider deprecating .bs and moving to .bh as the standard extension for BH. (I see that Linguist's database of source languages lists a couple that also use .bs but none that use .bh, so it'd help distinguish BH source files just by the extension.)

Pretty-printing is of course also used by BSC, in error messages. So "bugs" in the pretty printer output are worth fixing even if we don't care about bsv2bsc. However, BSC's error messages don't display full programs, so there's lots of parts of the pretty-printer that aren't used. The BSV pretty printer is probably in good shape, because the user-friendly-ness of BSC for BSV users was a priority for many years. BH was ignored, so it's quite possible that the pretty-printing by BSC is not user friendly. BSC does also print BH in some of its debug dumps, for developers, so it's also used there, but developers are less sensitive to the output being 100% perfect as long as they can read it.

Anyway, your suggestions all seem fine and I should be able to accept the PR. Thank you!

One comment would be, if we care about making the pretty-print output readable/friendly, is that it might be better to print using indentation rather than braces. (I'm not suggesting that you need to do that!) However, there are benefits to using braces, in developer dumps etc, so I wouldn't want to print that way everywhere. Just something to note.

RyanGlScott commented 1 year ago

Thanks for the additional context! For what it's worth, my actual motivation is that I am developing a Haskell library for representing Bluespec ASTs, and I am using the code in bsc as a reference for how to pretty-print ASTs. I noticed this bug along the way, and bsv2bsc just so happens to be the most convenient way that I've found for reproducing the bug. I'd be thrilled if there were a different, less convoluted way to reproduce it.

quark17 commented 1 year ago

Ah, very cool! Feel free to submit more fixes or libraries.

There is a way to dump the pretty-printed parsing, using BSC's flags for dumping after each stage:

bsc -dparsed=Bug529parsed.bs Bug529.bs

If you run BSC with -v, it will print the name and timestamp as it enters each stage, and for any one of those stages you can use -d<stagename> to dump the output to the screen or -d<stagename>=<filename> to dump to a file (where %f, %p, and %m can be used as macros for the file, package, or module name). (Running bsc -help-hidden will show some documentation for these developer flags.)

It turns out there there is already a testsuite a directory, bsc.syntax/bsv05_parse_pretty/, which exists for testing the BSV pretty printer by using dparsed= and then attempting to read back in the file. My suggestion would be to start a new directory, bsc.syntax/bh_parse_pretty/, which does something similar for BH pretty printing, and put examples there (instead of creating a new entry under bsc.bugs). This would let you test both let and letseq, since your source file would be BH. You can name the example Let.bs or PPrintLet.bs, and in the .exp file you can have a comment indicating that it's a test for issue 529.

Anyway, sorry to reward your effort with more work. If it's too much, I can do it, but I'd certainly appreciate your help to do it.

quark17 commented 1 year ago

Also, FYI, we have a long-term goal of making BSC (and its codebase) more modular. For example, making the CSyntax and its utilities into their own library, and the parsers into their own libraries, etc. And then BSC is just a top-level project that imports these libraries. So that people can use their them in their own projects, and so that people can experiment with implementing their own parts of the flow (new parsers, new backends, new features in between existing stages, etc).

If that fits with your use -- for example, having the CSyntax be a Cabal library that projects can import -- then feel free to propose or submit changes in that direction.

RyanGlScott commented 1 year ago

There is a way to dump the pretty-printed parsing, using BSC's flags for dumping after each stage:

bsc -dparsed=Bug529parsed.bs Bug529.bs

If you run BSC with -v, it will print the name and timestamp as it enters each stage, and for any one of those stages you can use -d<stagename> to dump the output to the screen or -d<stagename>=<filename> to dump to a file (where %f, %p, and %m can be used as macros for the file, package, or module name). (Running bsc -help-hidden will show some documentation for these developer flags.)

Ooh, this is exactly what I wanted earlier, but failed to find in the bsc --help output. Thanks! (Relatedly, is there a list of possible <stagename>s somewhere?)

It turns out there there is already a testsuite a directory, bsc.syntax/bsv05_parse_pretty/, which exists for testing the BSV pretty printer by using dparsed= and then attempting to read back in the file. My suggestion would be to start a new directory, bsc.syntax/bh_parse_pretty/, which does something similar for BH pretty printing, and put examples there (instead of creating a new entry under bsc.bugs). This would let you test both let and letseq, since your source file would be BH. You can name the example Let.bs or PPrintLet.bs, and in the .exp file you can have a comment indicating that it's a test for issue 529.

Perfect, that sounds much more direct than the bsv2bsc-based approach that I was using. As an added bonus, this approach would be testable without the use of DO_INTERNAL_CHECKS. I'll change #532 to use this new approach.

Anyway, sorry to reward your effort with more work. If it's too much, I can do it, but I'd certainly appreciate your help to do it.

No problem! I'm happy to start over if the end result becomes nicer.

RyanGlScott commented 1 year ago

Also, FYI, we have a long-term goal of making BSC (and its codebase) more modular. For example, making the CSyntax and its utilities into their own library, and the parsers into their own libraries, etc. And then BSC is just a top-level project that imports these libraries. So that people can use their them in their own projects, and so that people can experiment with implementing their own parts of the flow (new parsers, new backends, new features in between existing stages, etc).

Ah, good to know! Indeed, I am more-or-less copy-pasting CSyntax (and everything that it imports) into a separate, .cabal-based library. I'm only working with the pretty-printer–related parts of CSyntax at the moment, but I'd like to use its parser at some point as well.

If that fits with your use -- for example, having the CSyntax be a Cabal library that projects can import -- then feel free to propose or submit changes in that direction.

I might be interested in working on that, although I'd need to figure out how to make a .cabal-based project structure coexist with bsc's build system. Is there a bsc issue somewhere that is tracking this effort?

quark17 commented 1 year ago

(Relatedly, is there a list of possible <stagename>s somewhere?)

If you run bsc -help-hidden, the stage dump flags are listed towards the end -- just before a listing of all the trace flags (debugging prints inside the stages) that BSC knows about.

Is there a bsc issue somewhere that is tracking this effort?

I don't see one specifically for that. Issue #22 maybe -- it's about splitting up the codebase to allow parallel compilation for improved compile time, but it does mention Cabal as an option for that. PR #166 was a draft of support, but BSC has changed since then, so it would need to be updated. Issue #187 and PR #267 are similar, but for Stack instead of Cabal. (In a comment on that PR, I explained my thinking on how both Cabal and Stack building could be supported in the BSC makefile, as alternatives to the default compilation. My opinions might have changed since then; I'd have to stop and think about it.) Issue 37 is just generally about defining the supported OS/toolchains.