claytonrcarter / tree-sitter-phpdoc

PHPDoc grammar for tree-sitter
22 stars 10 forks source link

On grammar format and node structure: how/if to work w/ the PHP grammar #16

Closed claytonrcarter closed 2 months ago

claytonrcarter commented 2 years ago

This grammar is currently using the tree-sitter-php grammar to do some of the "heavy lifting" for parsing language constructs that are common between PHP and phpDoc. @mikehaertl raised questions about this approach in #14. Here's my take:

Pros

  1. Using PHP rules means that we don't have to reinvent the wheel to parse types, variables, scoped calls, etc. They've already worked out details of matching these and making sure they're complete, so we can use that w/o worrying about those details.
  2. Reusing PHP rules means that queries can (to some extent) be reusing between the two grammars.
  3. phpDoc comments only ever make sense w/i PHP documents, so it feels safe to assume that anyone working w/ phpDoc is also working w/ PHP
  4. The maintenance in adding new or updated types is easier (presumably) b/c we can just update tree-sitter-php w/o having to duplicate the fix/update.

Cons

  1. phpDoc supports array types (ie int[]) and grouped types (ie (int|float)[] which are not supported in PHP.
    • We will have to deconstruct the imported types and then reconstruct them to support these.
  2. We will probably (I do!) want to add some sort of support for phpstan and psalm "types" like class-string, positive-int, non-empty-array, etc
    • Ditto above.
  3. The PHP parser accepts certain constructs that aren't applicable to us. @mikehaertl offered an example over on #14. Simply reusing their parser means that we're implicitly supporting invalid phpDoc syntax.
  4. Reusing rules can make it harder to implement some changes (Wait didn't I just say it made it easier?!) because we have to combine the context between the two parsers. It can be really easy to get lost or confused between the two.
claytonrcarter commented 2 years ago

Compounding some of these considerations is the fact that phpDoc is ultimately just a comment, and is prone to creative uses, some of which are "off label" and/or invalid. For example, I don't think that @see $this->methodName() is valid phpDoc, but that doesn't mean it's not in use in the wild. Not that I'm suggesting we explicitly support that, but if we someday decided to, it would be as simple as dropping member_call_expression into the defn of _see_tag. (Ha! how's that for underestimating complexity?!)

On top of that, if I, as maintainer of a syntax highlighter for PHP, want to capture all uses of $this w/i member call expressions, I can just develop my query w/i the PHP grammar and know that it can be reused more or less as-in w/i phpDoc. (More complexity underestimation!)

Anyway, my 2 greatest concerns are around:

  1. Long term maintainability via an assumption that the php parser will get more action than this one, so we can benefit from their efforts to be complete and "correct".
  2. Erring towards more permissive than more precise. For example, I'm less concerned about "accidentally" allowing @see $className::someMethod() and much more concerned about implementing our own parsing logic without realizing that we're not allowing some oddball PHP construct that should be valid there.

Bottom line: both of these come down to me being a little bit lazy but also a lot not comfortable w/ my own expertise in parsing PHP. I don't have the capacity to research all of these edges, and reusing the PHP rules allowed me to get "close enough" to get things working for most cases.

claytonrcarter commented 2 months ago

I don't recall what made this issue important initially, but everything seems to be fine for now. Closing as a non-issue for the time being, but we can reopen if/when it becomes a problem for someone.