anachronauts / jeff65

a compiler targeting the Commodore 64 with gold-syntax
GNU General Public License v3.0
6 stars 0 forks source link

Comment syntax #2

Open jdpage opened 6 years ago

jdpage commented 6 years ago

Comments are indicated using --[[ and ]] as delimiters. Nesting is respected; ]] won't end a comment early if the comment contains a matching --[[.

An example single-line comment:

--[[ TODO: better examples ]]

An example multi-line comment:

--[[ hackneyed meta-commentary should probably
--   be avoided, but serves as a handy source of both
--   short and long text fragments. ]]

Note that the leading -- characters are presented exclusively for aesthetic purposes and entirely unnecessary. The following would work just as well:

--[[
examples can be used to both illustrate features
and set examples for their use, without going into
overbearing detail.
]]

(Original) C-style comments are ugly. Also, in C, they’re allowed anywhere; however, we only allow comments in statement-positions, so it might be better to switch to lua-style comments:

-- this is a comment

At the parser level, this means that they are terminated by a white space token that contains one or more ‘\n’ characters.

woodrowbarlow commented 6 years ago

so far we've avoided needing to treat newlines as special. comments hardly seem to be reason enough to break that.

jdpage commented 6 years ago

See, that’s what I thought too, but then I noticed that lua did in this one instance... I’m not wedded to the current syntax, but comments should either look statement-ish, or be handled by the lexer and passed as a single comment. Or both.

Lua also has multiline comments using --[[ and --]] and they have some neat tricks, but those tricks become less neat if you remove line comments.

I guess the question is, why do we not treat newlines as special. In my mind, it’s for the convenience of the developer—they shouldn’t have to worry about where to put line continuations, but the syntax should also be unsurprising (i.e. it shouldn’t combine lines that are supposed to be separate like JavaScript semicolon insertion). So we come up with this syntax where newlines don’t really matter that much...

except comments. I don’t know if that’s surprising or not—most developers who’ve used any other language would be unsurprised, but a new developer might be caught off-guard? I think humans are generally pretty comfortable with lines of text though.

jdpage commented 6 years ago

basically:

  1. These should probably be handled by the lexer anyway,
  2. Line comments are nice and I like --, but
  3. I’m fine as long as it’s readable and not /* */

Once they’re handled in the lexer, we can just implement the node the same as WhitespaceNode.

woodrowbarlow commented 6 years ago

I guess the question is, why do we not treat newlines as special.

in my mind, it's because a newline is really an implementation detail of a text format and it's sloppy to depend on it being \n or even existing at all. i'm not crazy fond of /* and */, but i do feel strongly about having explicit begin and end symbols. especially since everything else in the language has explicit begin and end symbols.

my goal for this project is to design a language that's very easy to learn; otherwise nobody will use it. and even if it ends up being "just another imperative programming language", well, it's "just another imperative programming language" for the commodore 64, and that's a need that isn't filled. are we on the same page regarding the goal?

woodrowbarlow commented 6 years ago

ultimately, i want to explore the possibility of bootstrapping gold-lang on the commodore 64.

jdpage commented 6 years ago

We are on the same page about it being easy to learn, though you should be aware that I have a whole spiel about how programmers use "easy" to mean "stuff I already know". And a lot of programmers already know a lot of things which are kind of stupid if you think about them (usually because of C or Java). I try to avoid the word "easy" in general, preferring "simple".

By the "simple" rubric, you're right about single-line comments. I am okay with not including them, but I don't want to rule them out entirely either.

How do you feel about --[[ and --]], and reserving the -- token but not using it? Same as Lua, has open/close delimiters, and if we change our mind on single-line comments later, we can use -- and get some of the neat syntactic convenience Lua has, but at no point are we obliged to do that.

jdpage commented 6 years ago

Oh, also, can we have comment delimiters nest properly? That might mean having to bump them over to the parser, but it makes being able to comment out blocks of code really easy, since you no longer have to worry about that block of code containing comments.

woodrowbarlow commented 6 years ago

i like --[[ and --]].

i'm also on board with properly nesting comments, although the lexer will only emit a single comment node even if it has other comments inside of it.

jdpage commented 6 years ago

Correction: Lua uses --[[ and ]], which looks a little nicer on one line: --[[ a comment ]] is that still ok?

If we go with ]] as the close delimiter, I'm actually happy with forgetting about line comments entirely.

One node is fine (preferable, even).

woodrowbarlow commented 6 years ago

oh yeah, i actually like that even more.

jdpage commented 6 years ago

Implemented in PR #9

jdpage commented 6 years ago

So uh... after a verbal discussion back in December, we ended up changing the comment syntax to be the C-like /* comment */ in b7e3f04d41942a9d6e70cf34f798d34544a92f64, mostly because the --[[ a comment ]] syntax would interact badly choke when commenting out blocks with nested array accesses, such[as[this]]. However, near the end of July, I re-floated the idea of going back to Lua comments, and @woodrowbarlow agreed to consider it.

Having given it a little bit of thought, I propose the following syntaxes for comments:

  1. "block" comments: --[[ text ]], --[=[ text ]=], --[==[ text ]==], etc., for block comments, where 0 or more equals signs are allowed, and the comment doesn't end until a close-comment with a matching number of equals signs is found. (This is the same behaviour as Lua.)

  2. "line" comments: -- a comment. The comment ends at the end of the line. (This is the same as Lua.)

  3. "smart" comments: --[[[ a comment ]]]. These behave the same way as comments do now, with the slight difference that (1) they can be nested, and (2) constructions[like[this[one]]] don't end it. This is accomplished by keeping a counter, which starts at 0. When a [ is encountered, the counter is incremented. When a ] is encountered, the counter is decremented if it is greater than 0. The comment ends as soon as a ]]] is encountered while the counter is less than 3. 1

Okay, so this looks a little redundant, but all three forms feel justified to me. Here's why: form 3 is similar to what we have in spirit (and implementation), but with additional handling to cope with commenting out code. It will work for basically 99% of commenting needs -- in practice, most pairs of brackets are matched, and it can handle the presence of up to two unmatched open brackets in a comment.

However, this is pretty experimental -- most programming languages don't look at the text inside comments at all! I'm honestly not sure if it's a footgun or not, but it seems worth a try unless we find a Lambda the Ultimate paper detailing how structured comments caused Cthulhu to rise from the deep and devour the CS department.

For that last 1% of cases, there are block comments (form 1). By including the appropriate number of equals signs, any arbitrary piece of text can be commented out, including other comments. This seems like a good idea -- users expect to be able to put arbitrary text in comments.

Finally, line comments (form 2). See above for the discussion of why we decided not to include them. I remain weakly in favour; I only bring it up again because Lua has this neat idiom for commenting out code. If you do it this way, with a --]] at the end instead of just a ]]:

--[[
some-stuff()
--]]

... then it can be uncommented simply by adding a - to the first comment opener:

---[[
some-stuff()
--]]

The first line is now parser as a line comment containing the text -[[, leaving the second line uncommented. The third line is also now a line comment, containing the text ]]. This works with both "smart" comments and block comments.


1 This is enormously awkward as far as the parser goes (I suspect, but don't know how to prove, that it's not actually LALR(1)), and it might be easier to hand-code the helper parser for it rather than trying to come up with some set of rules that matches it. Of course, if we require that square brackets match inside smart comments (which is easier to explain if more restrictive), then it becomes easy enough to express as an LALR(1) grammar.

woodrowbarlow commented 6 years ago

what are your thoughts about just doing forms 1 and 2?

form 1 eliminates the need for true nesting while preserving the "what's inside the comment can be literally anything except the close comment token". i know i've said before that i'd prefer true nesting, but i think i was just clinging to something shiny. lua's solution is pretty elegant, on reflection.

form 3 wouldn't ever be necessary with form 1 in the mix. i guess it might be slightly convenient sometimes, but is it convenient enough to rationalize the added complexity in syntax? my gut feeling is that it would never, ever get used by people. in part because it's strange, in part because it attempts to solve a problem that is already solved by form 1.

jdpage commented 6 years ago

I'm okay with doing just block and line. But, counteroffer: why not try having JUST smart and line?

I actually wouldn't mind trying out the (strictly-nesting version) of smart comments; personally, I'd probably use it by default for commenting out code at least, even if block comments were available. As long as we reserved the block comment syntax if smart comments turn out to be a problem, I think we're safe.

Prior art: https://softwareengineering.stackexchange.com/a/103259

The biggest downside that I can see is that you can no longer match comments with a regular expression, which is fine for us (we use LALR(1) to match them), but I'm wondering if it'd be a problem for syntax highlighters.

jdpage commented 6 years ago

It looks like Vim, Emacs, and Sublime are all smart enough to cope with Haskell's nested block comments, FWIW.

jdpage commented 6 years ago

Talked to @woodrowbarlow IRL, we're gonna try line and smart for now. Unassigning this issue until someone decides to pick it up.