Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
20.82k stars 344 forks source link

DFT_PARSE_ERROR_LIMIT on valid scala 2.13 files #521

Closed ftc closed 1 year ago

ftc commented 1 year ago

Description: I am getting a parse error when comparing the following two files contained in the following zip. I attempted to cut out as much extra stuff as I could while maintaining the error.

diff.zip

A screenshot of the error: Screenshot 2023-05-23 at 12 12 24 PM

Details:

I am running difftastic from the command line using difft One.scala OneNew.scala.

Difft version: Difftastic 0.47.0 installed via homebrew

OS: macOS Ventura Version 13.3.1 (a) (Intel based mac)

Thanks for looking into this, Difftastic looks like an awesome tool!

sageserpent-open commented 1 year ago

I was curious about what's going on here, so tried experimenting with the contents for further minimisation.

One.scala has been pithed out to be:

object Foo {

  def benign(parameter: Boolean): Unit = {}

}

OneNew.scala is now:

object Foo {

  def trouble(parameter: Boolean): Unit = {
    "Ouch" match {

      case _ if true                             => () // OK
      case _ if parameter && true                => () // OK
      case _ if parameter && (true || parameter) => () // OK
      case _ if parameter && (2 == parameter.hashCode() || parameter) =>
        () // OK
      case _ if 2 == parameter.hashCode() =>
        () // OK
      case _ if parameter.hashCode() == parameter.hashCode() =>
        () // OK
      // case _ if !parameter => () // ERROR
      // case _ if parameter && parameter => () // ERROR
      // case _ if parameter => () // ERROR
      // case _ if parameter == parameter => () // ERROR
    }
  }

}

Uncommenting any of the last four cases causes the original problem to be reproduced.

I'm guessing here, but as the error only involves one file, I think that the underlying parsing faults, and Difftastic simply reports a count of the errors to the user.

Looking at the problematic cases, it seems that if the expression tree for the case's if-clause only contains references to the parameter symbol, this causes the bug, but if constant terms or non-operator style method calls are mixed in, then this avoids the bug.

(Again, using Difftastic 0.47.0 on OSX, installed via Macports)

sageserpent-open commented 1 year ago

A bit more detail now, after looking at the tree-sitter grammar here: https://github.com/tree-sitter/tree-sitter-scala/blob/master/src/grammar.json

Simplifying OneNew.scala a bit more down to:

object Foo {
  def trouble(parameter: Boolean): Unit = {
    "Ouch" match {
      case _ if !parameter => () // ERROR
    }
  }
}

still reproduces the bug.

Running difft --dump-ts OneNew.scala yields the following syntax tree for the single file:

compilation_unit (0, 0) - (7, 0)
  object_definition (0, 0) - (6, 1)
    object (0, 0) - (0, 6) "object"
    identifier (0, 7) - (0, 10) "Foo"
    template_body (0, 11) - (6, 1)
      { (0, 11) - (0, 12) "{"
      function_definition (1, 2) - (5, 3)
        def (1, 2) - (1, 5) "def"
        identifier (1, 6) - (1, 13) "trouble"
        parameters (1, 13) - (1, 33)
          ( (1, 13) - (1, 14) "("
          parameter (1, 14) - (1, 32)
            identifier (1, 14) - (1, 23) "parameter"
            : (1, 23) - (1, 24) ":"
            type_identifier (1, 25) - (1, 32) "Boolean"
          ) (1, 32) - (1, 33) ")"
        : (1, 33) - (1, 34) ":"
        type_identifier (1, 35) - (1, 39) "Unit"
        = (1, 40) - (1, 41) "="
        block (1, 42) - (5, 3)
          { (1, 42) - (1, 43) "{"
          match_expression (2, 4) - (4, 5)
            string (2, 4) - (2, 10) "\"Ouch\""
            match (2, 11) - (2, 16) "match"
            case_block (2, 17) - (4, 5)
              { (2, 17) - (2, 18) "{"
              ERROR (3, 6) - (4, 4)
                case (3, 6) - (3, 10) "case"
                wildcard (3, 11) - (3, 12)
                  _ (3, 11) - (3, 12) "_"
                if (3, 13) - (3, 15) "if"
                ! (3, 16) - (3, 17) "!"
                lambda_expression (3, 17) - (4, 4)
                  identifier (3, 17) - (3, 26) "parameter"
                  => (3, 27) - (3, 29) "=>"
                  unit (3, 30) - (3, 32)
                    ( (3, 30) - (3, 31) "("
                    ) (3, 31) - (3, 32) ")"
                  comment (3, 33) - (3, 41) "// ERROR"
              } (4, 4) - (4, 5) "}"
          } (5, 2) - (5, 3) "}"
      } (6, 0) - (6, 1) "}"

It seems to me that the precedences or ambiguous grammar conflict resolution have gone wrong, causing the _postfix_expression_choice referenced by the definition of guard to fail to reduce - in favour of shifting the following => to being a lambda expression parameter => () // ERROR, which then causes the overall parse to fail.

Wilfred commented 1 year ago

Please consider filing a bug at https://github.com/tree-sitter/tree-sitter-scala

In the meantime, I find you usually get decent results with difftastic even if you increase the value of parse errors allowed.

sageserpent-open commented 1 year ago

@Wilfred Certainly, I've raised https://github.com/tree-sitter/tree-sitter-scala/issues/233.

sageserpent-open commented 1 year ago

@Wilfred It seems the upstream bug may have already been fixed of of commit 9cb4266 in tree-sitter-scala. I don't know whether this repository has its upstream dependencies updated as per individual issues or on a schedule, so will leave this in your hands, thank you.

In the meantime, I can try a local build just to verify - not sure what this entails but if I make progress I'll let you know.

sageserpent-open commented 1 year ago

@Wilfred I tried this out with a local build of Difftastic, doing a subtree pull of tree-sitter-scala.

In addition to checking the examples I posted above directly against tree-sitter, I rebuilt both tree-sitter-scala via NPM in its own subtree and then did a cargo build of Difftastic.

The rebuilt executable successfully compares the examples I submitted above (I've edited one of them to re-include the problematic lines):

./target/debug/biggerSample.scala --- Scala
2   def trouble(parameter: Boolean): Unit = {                                                              3   def trouble(parameter: Boolean): Unit = {
3     "Ouch" match {                                                                                       4     "Ouch" match {
.                                                                                                          5 
.                                                                                                          6       case _ if true                             => () // OK
.                                                                                                          7       case _ if parameter && true                => () // OK
.                                                                                                          8       case _ if parameter && (true || parameter) => () // OK
.                                                                                                          9       case _ if parameter && (2 == parameter.hashCode() || parameter) =>
.                                                                                                         10         () // OK
.                                                                                                         11       case _ if 2 == parameter.hashCode() =>
.                                                                                                         12         () // OK
.                                                                                                         13       case _ if parameter.hashCode() == parameter.hashCode() =>
.                                                                                                         14         () // OK
4       case _ if !parameter => () // ERROR                                                               15       case _ if !parameter => ()
.                                                                                                         16       case _ if parameter && parameter => ()
.                                                                                                         17       case _ if parameter => ()
.                                                                                                         18       case _ if parameter == parameter => ()
5     }                                                                                                   19     }
6   }                                                                                                     20   }
                                                                                                          21 

I was going to submit a PR for this, but I'm confused by the build procedure - simply doing a cargo build at the top-level directory wasn't enough to pick up the changes after doing the subtree pull, I had to perform a manual NPM build and manually regenerate the parser C-code prior to the overall cargo build - I'm probably missing some obvious step.

With that in mind would you do the update in Difftastic to close this ticket, please?