chharvey / counterpoint

A robust programming language.
GNU Affero General Public License v3.0
2 stars 0 forks source link

Tokenize Comments #4

Closed chharvey closed 4 years ago

chharvey commented 4 years ago

Tokenize line comments, multiline comments, nested multiline comments, and block comments.

Comment ::= CommentLine | CommentMulti | CommentMultiNest | CommentDoc

CommentLine ::= "\" [^#x0A#x03]* /*? lookahead: #x0A ?*/

CommentMulti ::= '"' [^"#x03]* '"'

CommentMultiNest ::= '"{' CommentMultiNestChars? '}"'
CommentMultiNestChars ::= [^}"#x03] CommentMultiNestChars?
|                         "}" ([^"#x03] CommentMultiNestChars?)?
|                         '"' ([^{#x03] CommentMultiNestChars?)?
|                         CommentMultiNest CommentMultiNestChars?

CommentBlock ::= /*? following: #x0A [#x09#x20]* ?*/'"""' #x0A (/*? unequal: [#x09#x20]* '"""' ?*/[^#x03]* #x0A)? [#x09#x20]* '"""' /*? lookahead: #x0A ?*/

Line Comments

Line comments begin with a backslash \ and go until the next line feed character.

3 + 5 \ a line comment
8 + 13 \ another line comment
\ line comments can be on their own line \ and can contain more backslashes ("nesting" if you will)

Multiline Comments

Multiline comments, non-nestable, are delimited with double-quotes " " and may contain line breaks.

3 + "the next number
is five" 5

Nested Multiline Comments

To nest multiline comments, use the delimiters "{ }". Non-nested comments can turn into nested comments by simply inserting the braces.

3 + "{ the "{ previous number }" was three
and the "{ next number }" is five}" 5

Nestable comments are useful when we want to comment out code that already has comments in it.

"{
3 + "five" 5
}"

Note that there exists a convincing argument that commenting-out code is bad practice, since it only gets obsolete over time, and deleting it can always be retrieved from a versioning control system. While I agree that committing commented-out code to a repository is bad practice, I will argue that sometimes temporarily commenting-out code is a good strategy for testing and debugging code locally. Thus, nestable comments are a beneficial tool.

"{ commenting out to do some testing… don’t forget to put back in before committing
3 + "five" 5
}"

Block Comments

Block comments are entire blocks of prose, which are reserved for documentation. (There are future plans to have a separate system that will compile block comments into documentation.) Block comments are delimited by three double-quotes """, which must be on their own line, excluding whitespace.

"""
Compute the sum of three and five.
"""
3 + 5

A block comment itself may contain the three double-quotes symbol """, as long as it shares the line with other non-whitespace characters.

"""
Compute the sum of three and five.
"""The result
will be eight."""
"""
3 + 5

For example, we might want to include a block of example code in documentation, where that example code has its own block comment. One common pattern might be to prepend each line of the comment with some arbitrary non-WS character, where a sufficient documentation system could be configured to ignore that character.

"""
\  Compute the sum of three and five.
\
\  @​example
\  """
\  \  Compute the sum of eight and thirteen.
\  """
\  8 + 13
"""
3 + 5

(The backslash character was chosen here, but any non-WS character could be used — it’s a matter of preference.)

chharvey commented 4 years ago

commits 5425255 , 8d0a33a , 769c46b , f312300 close this

chharvey commented 4 years ago

Reopening this issue to rewrite comment syntax. Comments are reworked with new delimiters.

Lexical grammar:

Comment ::= CommentLine | CommentMulti | CommentDoc

CommentLine ::= "%" [^#x0A#x03]* /*? lookahead: #x0A ?*/

CommentMulti ::= "{%" CommentMultiChars? "%}"
CommentMultiChars ::=
    [^{%#x03] CommentMultiChars?       |
    "{" [^%#x03] CommentMultiChars?    |
    "%" ([^}#x03] CommentMultiChars?)? |
    CommentMulti CommentMultiChars?

CommentBlock ::= /*? following: #x0A [#x09#x20]* ?*/"%%%" #x0A (/*? unequal: [#x09#x20]* "%%%" ?*/[^#x03]* #x0A)? [#x09#x20]* "%%%" /*? lookahead: #x0A ?*/

There are only three types of comments: line, multiline, and block. Multiline comments are automatically nestable.

Line Comments

Line comments begin with a percent sign % (U+0025) and go until the next line feed character.

3 + 5 % a line comment
8 + 13 % another line comment
% line comments can be on their own line % and can contain more percent signs ("nesting" if you will)

Multiline Comments

Multiline comments are delimited with the characters {% %} and may contain line breaks. Multiline comments can be nested.

3 + {% the next number
is five %} 5

3 + {% the {% previous number %} was three
and the {% next number %} is five %} 5

Block Comments

Block comments are delimited by three percent signs %%%, which must be on their own lines, excluding whitespace.

%%%
Compute the sum of three and five.
%%%
3 + 5

Preceding/succeeding whitespace on the same line as the delimiters does not affect them.

    %%%
- Compute the sum of four and four.
- (this will not end the comment -->) %%% still a comment %%%
- Now the end of the comment:
    %%%
chharvey commented 4 years ago

Reopening again to consolidate multiline and block (doc) comments. Block comments are removed, and delimiters for multiline are changed. A separate comment parser can be used for doc comments.

Lexical grammar:

CommentLine
    :::= "%" [^#x0A#x03]* #x0A;

CommentMulti
    :::= "%%" CommentMultiChars? "%%";

CommentMultiChars :::=
    | [^%#x03] CommentMultiChars?
    | "%" [^%#x03] CommentMultiChars?
;

There are only two types of comments: line and multiline. Multiline comments are not nestable.

Line Comments

Line comments begin with a percent sign % (U+0025) and go until the next line feed character.

3 + 5; % a line comment
8 + 13; % another line comment
% line comments can be on their own line % and can contain more percent signs ("nesting" if you will)

Multiline Comments

Multiline comments are delimited with the characters %% %% and may contain line breaks. Multiline comments cannot be nested.

3 + %% the next number
is five %% 5;

This change is to lessen encouragement of the use of multiline comments (apart from doc comments — see below). Multiline comments between code makes it more cluttered and less readable. The preferred method is to insert line breaks in code and then use single-line comments.

% instead of:
func add(a: int, b: int %% TODO: make optional %%): int {
    return a + b;
}

% rewrite as:
func add(
    a: int,
    b: int, % TODO: make optional
): int {
    return a + b;
}

Documentation Comments

Documentation comments are not lexically distinct from multiline comments. The only difference is that an extra percent sign is appended to the opening delimiter: %%% %%. This extra symbol can signal a separate documentation parser that the comment serves as documentation of a code structure.

%%% Compute the sum of three and five. %%
3 + 5;

For style, we may put the delimiters on their own lines, and then append another percent sign to the closing delimiter (it’ll be treated as an empty single-line comment).

%%%
Compute the sum of three and five.
@const 8
%%%
3 + 5;