Open hollowrider opened 1 month ago
Hi @hollowrider,
Formal syntax rules associated with comments in the documentation are as follows:
comment → // comment-text line-break
multiline-comment → /* multiline-comment-text */
comment-text → comment-text-item comment-text?
comment-text-item → Any Unicode scalar value except U+000A or U+000D
multiline-comment-text → multiline-comment-text-item multiline-comment-text?
multiline-comment-text-item → multiline-comment
multiline-comment-text-item → comment-text-item
multiline-comment-text-item → Any Unicode scalar value except /* or */
Doesn't your input violate these rules since it contains an unmatched /*
? It is not a nested comment because it's not a comment since it's not terminated by */
. Maybe I'm interpreting the syntax rules wrong.
@msagca Thanks for your comment.
Exactly, /*/**/
violate these rules. However, What problem I meet is when users input a swift file with grammar mistakes like this and parser give an unexpected output.
Below swift input contain /*/**/
character and definitely it should raise an exception because it violates rules you list. However, when I parse this file, you will find no errors are thrown.
/*/**/
let _: [Any] = [
0, 1, 1.0, 1.0e+1, 1e+1, true,
"Hello, world!", "Hello, \(1)!", "Hello, \(1.0e+1)!", "Hello, \(Int.max)!",
(nil == nil)
]
/**
another comment
*/
if 10 < 20{
if 10 < 20{
}
}
And if you use grun token function to analyize this file, you will find the reason. The lexer recognizes the struct between line 1 and line 9 as the whole Block_comment or multiline-comment named in swift-book. Below is the lexer token result:
[@0,0:188='/*/**/\r\nlet _: [Any] = [\r\n 0, 1, 1.0, 1.0e+1, 1e+1, true,\r\n "Hello, world!", "Hello, \(1)!", "Hello, \(1.0e+1)!", "Hello, \(Int.max)!",\r\n (nil == nil)\r\n]\r\n/** \r\nanother comment\r\n*/',<Block_comment>,channel=1,1:0]
This isn't what I expect. To fix that, I suggest to change the Block_comment rule like below. Changed lexer will recognize the beginning /
and *
apart from following multiline-comment. And it will raise an error when grammar parses.
Block_comment: '/*' (Block_comment | '/' ~'*'|~'/')*? '*/' -> channel(HIDDEN);
There is the lexer output after changing the rule.
[@0,0:0='/',<'/'>,1:0]
[@1,1:1='*',<'*'>,1:1]
[@2,2:5='/**/',<Block_comment>,channel=1,1:2]
To be honest, I'm not an experienced antlr grammar writer, but I want to share the problem I meet and improve g4 file. Would you think it could work?
The Block_comment lexer rule can't handle comment like
/*/**/
.It will conduct an un expected error. Current Block_comment rule is this:To fix that, I make a little change on the Block_comment rule and there it is.
This rule will refuse
/*
character in Block_comment and match the nested comment corrently. I find this kind of defeat existing in swift2&swift3&swift5 lexer file and maybe other grammar files that allow multiline nested comment.Error swift code is below:
when using origin Block_comment rule, it will tokenize like this:
After fixing this defeat, it will work like this. And when parsing grammar, it will throw exception as expected.