factor / factor

Factor programming language
https://factorcode.org/
BSD 2-Clause "Simplified" License
1.65k stars 211 forks source link

Support for block / stack / nested / tagged comments #2634

Open nomennescio opened 2 years ago

nomennescio commented 2 years ago

Forth had ( - ) pairs for block comment, and \ for until-EOL comments. These inline comments were so commonly used to annotate stack effects of words, using -- to separate inputs from outputs, that Factor has made them into actual stack effect annotation.

Unfortunately, now Factor has no block comments at all, which is a real pity. One of the great ways to aid with stack gymnastics it to use inline comments to annotate stack layout at multiple places in a word. Which is currently not possible in Factor.

And if you look at C-like languages, and Haskell, these support nested comments (#if 0-#endif / {---} delimiters), which are extremely useful as nested "inline" comments. Such a feature would be useful on Factor too.

As ( is already a SYNTAX: word, it's difficult to reuse ( for a generic comment block (although the lexer could handle this itself, but would have to special-case detecting any -- inside and then consider for it to NOT be a comment), but it could be reused for a simple stack comment, where (-) without -- would be a stack annotation with no runtime behavior (well, at least for the near future).

If ( is not to be used, and new comment syntax is to be introduced, name clashes should be avoided, and we can just as well introduce full nested comments. Most simple would probably be to turn the lexer "on" and "off", with nesting handled as a simple up-down count. I think it's perfectly reasonable to set some restrictions on the use of nested comments (e.g. where their tokens can occur) to not make things needlessly complicated.

It might be useful to have "tagged" comments; an annotation to the "AST" of the lexer which can be introspected. That way special comments can be created that can e.g. processed or created by other tools

mrjbq7 commented 2 years ago

Factor has block style comments / /

It also has a nested version in nested-comments vocabulary.

I agree the new lexer will support this more natively.

Is there still an issue here to fix?

On Jul 22, 2022, at 10:06 PM, nomennescio @.***> wrote:

 Forth had ( - ) pairs for block comment, and \ for until-EOL comments. These inline comments were so commonly used to annotate stack effects of words, using -- to separate inputs from outputs, that Factor has made them into actual stack effect annotation.

Unfortunately, now Factor has no block comments at all, which is a real pity. One of the great ways to aid with stack gymnastics it to use inline comments to annotate stack layout at multiple places in a word. Which is currently not possible in Factor.

And if you look at C-like languages, and Haskell, these support nested comments (#if 0-#endif / {---} delimiters), which are extremely useful as nested "inline" comments. Such a feature would be useful on Factor too.

As ( is already a SYNTAX: word, it's difficult to reuse ( for a generic comment block (although the lexer could handle this itself, but would have to special-case detecting any -- inside and then consider for it to NOT be a comment), but it could be reused for a simple stack comment, where (-) without -- would be a stack annotation with no runtime behavior (well, at least for the near future).

If ( is not to be used, and new comment syntax is to be introduced, name clashes should be avoided, and we can just as well introduce full nested comments. Most simple would probably be to turn the lexer "on" and "off", with nesting handled as a simple up-down count. I think it's perfectly reasonable to set some restrictions on the use of nested comments (e.g. where their tokens can occur) to not make things needlessly complicated.

It might be useful to have "tagged" comments; an annotation to the "AST" of the lexer which can be introspected. That way special comments can be created that can e.g. processed or created by other tools

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

nomennescio commented 2 years ago

Totally missed that! I did not stumble upon multiline before. Cannot find nested-comments vocab though.

That leaves no issue to fix, except for an idea for tagged commits, but that's only minor.

nomennescio commented 2 years ago

And ( .. ) for intermediate stack effect annotation

mrjbq7 commented 2 years ago

Does that have a meaning separate from

! ( — )

?

On Aug 15, 2022, at 1:21 PM, nomennescio @.***> wrote:

 And ( .. ) for intermediate stack effect annotation

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

tgunr commented 2 years ago

From what I can see ! will skip till end of line while / / is treated as a spanning comment, either in line or multiline.

Both are fine as the are I suppose except for taking over the Forth role for 'store' and comment.

nomennescio commented 2 years ago

Does that have a meaning separate from ! ( — ) ?

Well, obviously, ! would make the rest of the line a comment, whereas ( .. ) could be really inline.

My idea for inline stack comments is that when writing code, it's sometimes pretty hard to visualize all stack effects mentally, and at a certain point it makes writing correct code harder than is necessary. In general I think that makes concatenative languages harder for some things that are trivial in other types of languages. Beginners need to already familiarize themselves with lots of stack manipulators and combinators which typically just exist to avoid using "locals". Being able to insert a "picture" of the stack will help with your mental model, and could potentially be used by the compiler to check for the correct intermediate stack layout, which could improve error reporting to the user.

mrjbq7 commented 2 years ago

In the old old days, ( x — y ) was a comment and (( x — y )) was a stack effect.

Perhaps we bring back (( x — y )) as a comment, or maybe just make (( anything )) a multiline comment

On Aug 16, 2022, at 4:39 AM, nomennescio @.***> wrote:

 Does that have a meaning separate from ! ( — ) ?

Well, obviously, ! would make the rest of the line a comment, whereas ( .. ) could be really inline.

My idea for inline stack comments is that when writing code, it's sometimes pretty hard to visualize all stack effects mentally, and at a certain point it makes writing correct code harder than is necessary. In general I think that makes concatenative languages harder for some things that are trivial in other types of languages. Beginners need to already familiarize themselves with lots of stack manipulators and combinators which typically just exist to avoid using "locals". Being able to insert a "picture" of the stack will help with your mental model, and could potentially be used by the compiler to check for the correct intermediate stack layout, which could improve error reporting to the user.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

nomennescio commented 2 years ago

In the old old days, ( x — y ) was a comment and (( x — y )) was a stack effect. Perhaps we bring back (( x — y )) as a comment, or maybe just make (( anything )) a multiline comment

Interesting! If you make it into a nested comment, you kill three birds with one stone.

tgunr commented 2 years ago

That I like, or even change the lever such that if missing -- the ( ) are just a comment and ( comment ( nested comment) ) where if missing -- before the first ( within is also treated as comment) which I would like even more.

Changing the lever this way basically permits the current style of () pairs to remain in place while extending the meaning.

twopir commented 2 years ago

from Discord, a 90% solution:

SYNTAX: (( "))" parse-effect-tokens 2drop ;

allows for inline stack-state comments

foo bar (( x y ))
baz (( y' z x ))

and allows for further extension, like invoking the stack checker in the future

nomennescio commented 2 years ago

And from the same Discord; as Factor has a limited supply of delimiters, and in recognition of Forth's heritage, I would prefer to move a bit of complexity into the parser, and have ( -- ) parse (nested) stack effects, and ( ) parse (nested) stack comments.

Maybe we can introduce an "ignored" object in the parse tree to that effect? Or have a flag on a stack effect object marking its "type" (effect / comment). The compiler can for now then just ignore stack comments, and maybe in the future use them for stack "debugging" as noted above.

twopir commented 2 years ago

per @nomennescio's recognition that ( a b c ) is equivalent to ( a b c -- a b c ), we can do this without changing the stack-effect structure at all, instead making the parser replicate the input stack effect to the output if it doesn't encounter the -- delimiter.

tgunr commented 2 years ago

That is true, having one delimiter for both stack effect and comments gets my vote. But the current parsing expects a call effect right after the word to be defined. What about the case : test 2 3 + . ; where there is no call effect, could not the same be said for this case? ( -- ) if there is no stack effect at all?

razetime commented 2 years ago

Considering that the mutiline vocab already takes one delimiter (/*), why not change that to (( in the newer release?

mrjbq7 commented 2 years ago

Good idea. Added (( comments in f2db336221c17d3dafc0faf85371a86a330242f7.

mrjbq7 commented 2 years ago

I’d be interested in a different comment-to-eol character too, but not sure what we should choose.

// # !

They all have some obvious uses aside from being comments.

On Aug 17, 2022, at 1:20 PM, Dave Carlton @.***> wrote:

 That I like, or even change the lever such that if missing -- the ( ) are just a comment and ( comment ( nested comment) ) where if missing -- before the first ( within is also treated as comment)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

nomennescio commented 2 years ago

I’d be interested in a different comment-to-eol character too, but not sure what we should choose. // # ! They all have some obvious uses aside from being comments.

Hi John, from a parsing point of view this is very important. If any of those characters are to appear in any token, there's a possible parsing/tokenizing conflict. If we assume however that tokenizing is predominantly done by using space as delimiters between tokens (i.e. tokens themselves can contain any character, including the comment character, see also my PR on having custom token parsers), then it would be required that a comment-to-eol character is delimited by whitespace too (which can be the beginning of a line too).

So in the end, it is then important that the comment-to-eol character does not conflict with an existing single character token. That would rule out # unfortunately (because of math.parser). Of course ! is the existing character, which I think is a bit of a pity, as it used to denote store semantics in Forth, which might still be useful for locals or variants thereof. The // token is close to a mathematical operator, but is indeed widely used as comment character. I find it visually less appealing, but it's quite pervasive. I do like VHDL's --, but that would need a rule that it has lower priority over -- in stack comments, which would be fine I think. It might also conflict with some existing packages (graphviz).

As an aside, I would applaud the sequence #! at the beginning of a file (!) to be reserved as an comment-to-eol sequence, to support shell scripts that contain Factor code.

What could be nice, is to have a dynamic to-end-of-line comment string, which would really not be that hard to implement. You could set it e.g. with SKIP: or something similar. Then everyone can use his favorite style. Again, for that to work, it's important to determine tokenizing order, and the best way would be to let comment tokens have lowest priority (i.e. existing words override comment tokens). It might be beneficial to have a 'high-priority' comment token to override this, but that could for instance be reserved for beginning of line. It's crucial to make parsing consistent and not too limited by language reserved tokens.

nomennescio commented 2 years ago

See also: [https://en.wikipedia.org/wiki/Comment_(computer_programming)]() [https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(syntax)#Comments]()

nomennescio commented 2 years ago

As an aside, I still find it a bit of a pity \ was not used as comment character, given Factor's Forth heritage for at least its syntax. Factor could just have used Forth's ' (tick) for that.