Curly brackets in comments create parse errors

claytonrcarter / tree-sitter-phpdoc

PHPDoc grammar for tree-sitter

22 stars 11 forks source link

Curly brackets in comments create parse errors #9

Closed mikehaertl closed 2 years ago

mikehaertl commented 2 years ago

I often use code examples in my code which right now lead to a parser error:

ts_phpdoc_brackets

This is probably caused by this line: https://github.com/claytonrcarter/tree-sitter-phpdoc/blob/52c0fbf581d0fc2d29696dbbd4fca99a73082210/grammar.js#L375

I wonder if { and } should really be excluded from comment text. The phpdoc "standard" does not forbid this I think. And I might use curly brackets even outside of code examples somewhere in a comment.

claytonrcarter commented 2 years ago

Thanks, I'll look into this when I get a chance. Off the top of my head, I wonder if that exception was put in place to deal w/ inline tags, which I think are (technically) supposed to be surrounded by braces, like {@see https://example.com}.

mikehaertl commented 2 years ago

Yes, that's what I thought, too. But maybe it can be made a bit more specific (like {@[^@{}]*}?) instead of excluding single braces.

claytonrcarter commented 2 years ago

It was even easier than that: they weren't actually important and all existing tests were still passing after I removed them from that line. I added a test so we don't accidentally re-break it in the future, though. This is on master and on npm as 0.0.7.

mikehaertl commented 2 years ago

I thought about that, too. But I found that this breaks inline tags inside descriptions:

=====================
Inline tags in description
=====================
/**
 * Some description with {@inheritDoc} and more
 */
---
(document
  (description
    (text)
    (inline_tag (tag_name))
    (text)))

Actually I think this can't be solved reliably without an external scanner. I've spent quite some time today on this. But I'm close to giving up: There's too much going on that I can't wrap my head around. If you're interested I could open a PR and we can discuss things there.

claytonrcarter commented 2 years ago

Oh, yuck. Yeah, I see what you mean. I can try to play with it a bit. If it ends needing an external scanner, then we're getting in over my head.

FWIW, it looks like tree-sitter-jsdoc has a similar issues: https://github.com/tree-sitter/tree-sitter-jsdoc/issues/1

mikehaertl commented 2 years ago

If it ends needing an external scanner, then we're getting in over my head.

That's what I first thought, too. But the documentation helps a bit. I could also take some ideas from the javascript scanner and PHP scanner.

Right now my scanner at least compiles and does - well - something. But it's so weird as suddenly other tokens (namely name) are matched that should not be in effect at all. It's also a bit of a mistery why for example extras are not excluded from my scanner.

I'll spend some more time on this and will open a PR when I have something.

FWIW, it looks like tree-sitter-jsdoc has a similar issues: tree-sitter/tree-sitter-jsdoc#1

Yeah. Not sure how up-to-date that parser is. Last commit on the grammer is from 2018. And they don't have to deal with nested inline tags either.

claytonrcarter commented 2 years ago

I think this is now fixed by #10