UserNobody14 / tree-sitter-dart

Attempt to make a tree-sitter grammar for dart
MIT License
56 stars 33 forks source link

Slow parsing time for `(if_statement (block))` query #46

Closed genesistms closed 1 year ago

genesistms commented 1 year ago

I've got a problem inside neovim that opening dart file took a very long time. After some debugging i found that trying to parse dart file with this query is the problem:

(if_statement (block))

But this one is ok: (if_statement)

And it does not matter what is inside dart file, it can be empty.

tree-sitter query bad.scm empty.dart  1.90s user 0.00s system 99% cpu 1.907 total
tree-sitter query good.scm empty.dart  0.01s user 0.00s system 96% cpu 0.006 total

dart parser commit: 0033b22 (currently latest) tree-sitter 0.20.8

genesistms commented 1 year ago

It is very similar to this issue https://github.com/UserNobody14/tree-sitter-dart/issues/23

TimWhiting commented 1 year ago

Not all if statements have blocks, so it definitely does have to do some extra work in the query itself.

I'm seeing the bad example take 140 seconds to do all of the dart files in the flutter sdk's packages folder, and 32 seconds for the simpler query.

Also on an empty file I'm seeing a 2.5 seconds for the bad, and .5 seconds for the good.

Does neovim cache the query, or does it recompile the query for every file? It should keep a cache per language / per query. Otherwise you are probably paying an upfront cost. The empty file basically measures the upfront cost for compiling a query. Also, maybe neovim can eagerly cache all the queries for a particular language?

Given that it takes me 140 seconds for 2460 dart files, ~ that ends up being .05sec per file. Additionally my test script does it in groups of 100 files, so I'm actually paying the upfront cost 25 times. Subtracting the upfront cost of 2.5seconds*25times = 140s-62s = 80s / 2460 files = .03sec per file, compared to 30s / 2460 files = .012sec per file.

While there is definitely room for improvement, they are along the same order of magnitude, which is much different than your results. As I don't know how to optimize queries, and I think this area of the grammar is already simple, I'm going to close this issue. There is definitely room for improvement in the simplicity of the grammar, but if statements in particular don't seem like a problem spot to me.

genesistms commented 1 year ago

I just point out that test i have given was made only for single file without neovim.

So it is normal that just tree-sitter with this parser is taking almost 2sec for one query and be instant for another (on empty file)?

genesistms commented 1 year ago

And ad "not all if have block" i just give it a block to demonstrate... The original query that was causing the issue is this:

(if_statement
  [
    condition: (_)
    consequence: (_)
    alternative: (_)?
  ] @conditional.inner) @conditional.outer
genesistms commented 1 year ago

ad neovim caching: It is slow for every file that i try to open. So i believe that it does not cache that.

TimWhiting commented 1 year ago

I'll reopen, but I really don't know what makes the dart grammar so different from other grammars - why another grammar might be instant. I'm unlikely to work on this in the near-term, since I have other priorities. Anyone else who is interested is free to contribute however.

In the meantime, it might be easier to see how neovim uses queries and if there is a potential for reusing a query across many files. Neovim also has more contributors, and the neovim-treesitter people might be able to provide a few pointers here about how to change the grammar or how to update neovim's treesitter support.

genesistms commented 1 year ago

I belive @theHamsta can point me to right direction?

ktakayama commented 1 year ago

How about this commit? -> 1a525edd89026cc6f0a954b4718ce20fd7e45b15 This problem may have been caused by this commit.

I'm also using neovim + nvim-treesitter. And maybe I've fixed this slowdown problem on dart file to the following changes nvim-treesitter project.

https://github.com/nvim-treesitter/nvim-treesitter/blob/master/lockfile.json

diff --git a/lockfile.json b/lockfile.json
index de32b6e..69f217b 100644
--- a/lockfile.json
+++ b/lockfile.json
@@ -81,7 +81,7 @@
     "revision": "c2fbf21bd3aa45495fe13247e040ad5815250032"
   },
   "dart": {
-    "revision": "921149688d70f163d650f26c23f1bc42fac5d303"
+    "revision": "692401954ed047136c1b9d9925214f4703350ee7"
   },
   "devicetree": {
     "revision": "d2cc332aeb814ea40e1e34ed6b9446324b32612a"
genesistms commented 1 year ago

Yes... it seems that it is faster but still around 1s to open a file.

My workaround is comment out if_statement queries in indents.scm in nvim-treesitter and textobjects.scm in nvim-treesitter-textobjects.

ktakayama commented 1 year ago

I think it seems to have slowdown after this change.

https://github.com/UserNobody14/tree-sitter-dart/commit/1a525edd89026cc6f0a954b4718ce20fd7e45b15#diff-919ac210accac9ecc55a76d10a7590e3d85ca3f0e165b52d30f08faee486d0cbL1317-R1328

ktakayama commented 1 year ago

The following changes should significantly improve performance. However, I'm not sure about the full extent of their impact. 💦

diff --git a/grammar.js b/grammar.js
index f8b6ccb..48e51e3 100644
--- a/grammar.js
+++ b/grammar.js
@@ -1319,11 +1319,11 @@ module.exports = grammar({
             $._logical_or_pattern,
         ),

-        _logical_or_pattern: $ => seq($._logical_and_pattern, repeat(seq($.logical_or_operator, $._logical_and_pattern))),
-        _logical_and_pattern: $ => seq($._relational_pattern, repeat(seq($.logical_and_operator, $._relational_pattern))),
+        _logical_or_pattern: $ => seq($._logical_and_pattern, seq($.logical_or_operator, $._logical_and_pattern)),
+        _logical_and_pattern: $ => seq($._relational_pattern, seq($.logical_and_operator, $._relational_pattern)),
         _relational_pattern: $ => 
         prec(DART_PREC.Relational, choice(
-                seq(choice($.relational_operator, $.equality_operator), $._real_expression),
+                seq(choice($.relational_operator, $.equality_operator), $.bitwise_or_expression),
                 $._unary_pattern,
             )
         ),
TimWhiting commented 1 year ago

I can give it a test later today. I think I tried something similar when I introduced the patterns support, but it ended up parsing some things incorrectly, but I'll check again.

TimWhiting commented 1 year ago

Sorry, I tried those changes and a few similar in nature, but they ended up causing issues with parsing code. I assume you'd rather have a correct parser, and not just a fast one :). I think probably the issue is more primarily with the number of conflicts that are in the parser. I've mentioned a few times that I'll look into reducing the number of conflicts. I've got a pretty big paper deadline for my PhD research this week. So I probably won't get to it this week, but I'll try revisiting it next week. In the meantime, I'd be happy to review PRs which don't take as long.

The particular commit mentioned did revolve around these changes, but the previous commit actually had a non-working parser (the implementation of the new dart 3 features was in flux, and I committed a few incremental updates that weren't always completely working). So it isn't actually as helpful as you might think.

If you want to contribute, it should be pretty straightforward, just clone the repo:

npm install
# Build and test the grammar on the unit tests
npm run build && npm run test

# Test the grammar on flutter repo
# Make sure you are on the master branch of flutter (which uses more dart 3.0 features) 
# I use fvm to switch between versions - https://pub.dev/packages/fvm
cd tester && dart pub upgrade
# Back in the tree-sitter-dart directory
dart tester/test.dart ~/fvm/versions/master/packages parse

You can also use the tester script to test queries on a large set of files. It batches the files in groups of 100 to send to the tree-sitter cli.