ganezdragon / tree-sitter-perl

a perl parser for javascript
MIT License
33 stars 9 forks source link

Parser problem with chained method calls #29

Open Leeft opened 1 year ago

Leeft commented 1 year ago

I've come up with a bit of code we use a lot at work, and the parser doesn't handle it very well. I'm not sure where to go with the solution, so here are some thoughts and hoping for feedback.

my $rs = $self->schema->resultset('Customer')
    ->active
    ->uk_based(  );

The current parser fails rather spectacularly at this:

# ex1
(source_file [0, 0] - [2, 19]
  (variable_declaration [0, 0] - [2, 18]                                    # my $rs = $self->schema->resultset('Customer')->active->uk_based();
    (scope [0, 0] - [0, 2])                                                 # my
    (single_var_declaration [0, 3] - [0, 6]                                 # $rs
        name: (scalar_variable [0, 3] - [0, 6]))                            #
    value: (method_invocation [0, 9] - [2, 18]                              # $self->schema->resultset('Customer')->active->uk_based();
      object: (scalar_variable [0, 9] - [0, 14])                            # $self
      (arrow_operator [0, 14] - [0, 16])                                    #      ->
      function_name: (identifier [0, 16] - [0, 22])                         #        schema
      (arrow_operator [0, 22] - [0, 24])                                    #              ->
      function_name: (identifier [0, 24] - [0, 33])                         #                resultset
      args: (arguments [0, 33] - [2, 18]                                    #                         ('Customer')->active->uk_based();
        (argument [0, 33] - [2, 18]                                         #                         ('Customer')->active->uk_based();
          (method_invocation [0, 33] - [2, 18]                              #                         ('Customer')->active->uk_based();
            object_return_value: (string_single_quoted [0, 34] - [0, 44])   #                          'Customer'
            (arrow_operator [1, 4] - [1, 6])                                #                                     ->
            function_name: (identifier [1, 6] - [1, 12])                    #                                       active
            (arrow_operator [2, 4] - [2, 6])                                #                                             ->
            function_name: (identifier [2, 6] - [2, 14])                    #                                               uk_based
            args: (empty_parenthesized_argument [2, 14] - [2, 18]))))))     #                                                       ()
  (semi_colon [2, 18] - [2, 19]))

I was fairly easily able to make it stop do that, though the result isn't that much better overall. Note how there is a single method_invocation while there are 4 method calls:

  # ex2
  (variable_declaration [0, 0] - [2, 18]        
    (scope [0, 0] - [0, 2])
    (single_var_declaration [0, 3] - [0, 6]     
        name: (scalar_variable [0, 3] - [0, 6]))
    value: (method_invocation [0, 9] - [2, 18]  
      object: (scalar_variable [0, 9] - [0, 14])
      (arrow_operator [0, 14] - [0, 16])
      function_name: (identifier [0, 16] - [0, 22])
      (arrow_operator [0, 22] - [0, 24])
      function_name: (identifier [0, 24] - [0, 33])
      args: (parenthesized_argument [0, 33] - [0, 45]
        (arguments [0, 34] - [0, 44]
          (argument [0, 34] - [0, 44]
            (string_single_quoted [0, 34] - [0, 44]))))
      (arrow_operator [1, 4] - [1, 6])
      function_name: (identifier [1, 6] - [1, 12])
      (arrow_operator [2, 4] - [2, 6])
      function_name: (identifier [2, 6] - [2, 14])
      args: (empty_parenthesized_argument [2, 14] - [2, 18])))
  (semi_colon [2, 18] - [2, 19]))

Part of the problem would seem to be that method_invocation includes that what it is invoked on. I think that makes sense if the methods were executed right to left and thus somewhat recursively, but they're not, they're executed sequentially. Aside from that, I think the scope/single_var_declaration should've been wrapped in an lvalue node, and the rest after the assignment in an rvalue node (adding more structure). In these fictional results I've added this rvalue.

More method_invocation calls would start to look something like this:

# ex3
(source_file
  (variable_declaration
    (scope)                                                                   # my
    (single_var_declaration (scalar_variable))                                # $rs
                                                                              # = (anonymous)
    (rvalue                                                                          
      (method_invocation (scalar_variable) (arrow_operator) (identifier))       # $self -> schema
      (method_invocation (arrow_operator) (identifier)
         (parenthesized_argument (arguments (argument (string_single_quoted))))
      )                                                                         # -> resultset( 'Customer' )
      (method_invocation (arrow_operator) (identifier))                         # -> active
      (method_invocation (arrow_operator) (identifier)
         (empty_parenthesized_argument)
      )                                                                         # -> uk_based ()
    )
  )
  (semi_colon)
)

But now the 1st one has the variable, the rest has nothing. I'm not keen on that. And a method call is just that, it is called on whatever is fed into it. Which led me to something like this:

# ex4
(source_file
  (variable_declaration
    (scope)                                                                     # my
    (single_var_declaration (scalar_variable))                                  # $rs
                                                                                # = (anonymous)
    (rvalue                                                                     
      (scalar_variable)                                                         # $self
      (method_invocation (arrow_operator) (identifier))                         # -> schema
      (method_invocation (arrow_operator) (identifier)
         (parenthesized_argument (arguments (argument (string_single_quoted))))
      )                                                                         # -> resultset( 'Customer' )
      (method_invocation (arrow_operator) (identifier))                         # -> active
      (method_invocation (arrow_operator) (identifier)
         (empty_parenthesized_argument)
      )                                                                         # -> uk_based ()
    )
  )
  (semi_colon)
)

I've not yet investigated how hard it might be to get to this result (doesn't seem to be an inconsequential change to the parse tree), but I'd love to know what others think.

Leeft commented 1 year ago

And yeah, maybe things should be more nested, but maybe not. I'm not decided yet either way. FWIW, here's what that other parser (tree-sitter-perl/tree-sitter-perl) makes of it:

(source_file [0, 0] - [2, 19]
  (expression_statement [0, 0] - [2, 18]
    (assignment_expression [0, 0] - [2, 18]
      left: (variable_declaration [0, 0] - [0, 6]
        variable: (scalar [0, 3] - [0, 6]))
      right: (method_call_expression [0, 9] - [2, 18]
        invocant: (method_call_expression [0, 9] - [1, 12]
          invocant: (method_call_expression [0, 9] - [0, 45]
            invocant: (method_call_expression [0, 9] - [0, 22]
              invocant: (scalar [0, 9] - [0, 14])
              method: (method [0, 16] - [0, 22]))
            method: (method [0, 24] - [0, 33])
            arguments: (string_literal [0, 34] - [0, 44]))
          method: (method [1, 6] - [1, 12]))
        method: (method [2, 6] - [2, 14])))))

(not very readable at least imo)

ganezdragon commented 1 year ago

@Leeft let me work on this, this weekend. Even in my org, we have several expressions as such. Let me refactor it.

Leeft commented 1 year ago

Okay @ganezdragon, thank you and please keep us posted :)