B-Lang-org / bsc

Bluespec Compiler (BSC)
Other
954 stars 146 forks source link

uppercase names not allowed in interface property `arg_names` #654

Closed mieszko closed 11 months ago

mieszko commented 11 months ago

the arg_names property of interface fields does not allow names starting with uppercase:

interface Foo =
    foo :: Bool -> Action {-# arg_names = [FOOBAR] #-}

{-# synthesize mkFoo #-}
mkFoo :: Module Foo
mkFoo = return _

this is the error:

Error: "Foo.bs", line 4, column 49: (P0005)
  Unexpected "]"; expected "."

it seems that the parser thinks that FOOBAR is a package name, and expect it to start a qualified identifier. and indeed the relevant fragment of CParser.hs uses pFieldId for this:

    literal (mkFString "arg_names") ..+ eq ..+ lb ..+  pFieldId `sepBy` cm +.. rb `into`
                (\a -> succeed $  [(PIArgNames a)])

but of course things with dots don't make sense to generate as a Verilog identifier, and if you replace FOO with FOO.bar then the compiler indeed rightfully complains about this:

Error: "Foo.bs", line 4, column 4: (G0106)
  Method `foo' generates a port with name `foo_FOO.bar' which is not an
  accepted Verilog identifier. It must contain only letters, digits, $ and _,
  but cannot begin with $ or a digit.

interestingly PIArgNames takes [Id] whereas all other properties take String; not sure why it's not [String] but perhaps there is some deep reason somewhere?

quark17 commented 11 months ago

The datatype and the Classic parsing were added in 2005 and remained mostly unchanged. The comments on the datatype look wrong, as if they were for an earlier iteration than what we see, so it's possible that the checked-in code has other aspects (like the choice of Id) that came up during development but weren't cleaned up in the final version? However, the Id type could be used because it carries position information, and note that PIArgNames is not a user attribute in BSV (like the others) but is automatically created by the BSV parser -- because BSV syntax requires argument names. These renaming pragmas were added when BSV was the only supported language, and parsing in Classic would only be there because of the need to read in .bi files (which no longer exist).

As you say, some digging is needed to see if the position information is used. (I think that GenWrap uses the names, possibly when creating the wrapper, so if there was ever an error message on that code, it would point to the name in the interface declaration.)

I think it's OK for the type to be Id as long as only the basename is used (and not the qualifier) -- though arguable it could just be (String,Position). I don't think we should rely on later passes to print the name as qual.base and catch that it's not valid. So I approve of your change in the PR to parse a lowercase name, an uppercase name, or a string (in quotes) -- arguably only a string should be allowed, but no need to break backwards compatibility. Note that a dot in the name is OK Verilog if the name is escaped, and I think we should support people writing "foo.bar" as a string (as long as the backend prints the escaped name). And, in fact, I see that I can type prefix="FOO.bar" and that this generates Verilog with a dot in the name -- though sadly not escaped! (Incidentally, "\\Foo.bar" parses and results in an escape character, but no space is added after the name, so the Verilog is still wrong. And if I try prefix="\\Foo.bar " with a space, the Verilog is still wrong, because suffixes like _1 are added after the space!)

mieszko commented 11 months ago

good point, i always forget that things are sometimes Ids in bsc as a hack to keep the position info. i agree that (String, Position) is perhaps preferable (if the that position is used at all, anyway) — but i think the game of fixing that everywhere is not really worth the candle.

fwiw, i'm not sure autoescaping from (say) result = "foo.bar" is superior to requiring result = "foo\\.bar" — naked foo.bar has semantics in verilog (it reaches inside a module instance), so for my taste the explicit \\ makes it extra clear that the user explicitly wishes to have an escaped dot in the identifier (i'd think a rather uncommon case).

quark17 commented 11 months ago

fwiw, i'm not sure autoescaping from (say) result = "foo.bar" is superior to requiring result = "foo\\.bar"

Be careful: foo\.bar is not how escaping works in Verilog, so writing result="foo\\.bar" won't work. Verilog escaping works by putting \ at the start of the identifier, and then any characters which follow are part of the name, until a space is reached. So, a user would need to write result="\\foo.bar", but that isn't going to work currently because it requires help from BSC to ensure that there's a space after every occurrence of the name -- otherwise, any following semi-colon, for example, will be considered part of the escaped name.

Currently, if the user does want a name containing non-standard characters, there's no way to write it.

And BSC doesn't consistently report an error about bad characters in the identifier. If I specify prefix="x+y", that runs without error and generates Verilog (with a bad identifier); but if I change it to result, then there's an error about an invalid name. And there's no proper handling or checking of \ or spaces in the string.

I also notice that I can put a result= attribute on an Action method and there's no message about the ignored pragma (in either BSV or Classic).

mieszko commented 11 months ago

right, initial \. easy enough to add a terminating space if there is an initial \ to the PPrint instance for VId, no?

either way i suspect using escaped names in "native" verilog is rare, so either way works really. of course there should me more error checking (also applies to some classic ids that are not valid verilog ids, although maybe escaping already happens there).