Closed jwoudenberg closed 2 years ago
I think I'm aware that this can happen, but not sure if I should warn you that this fix might be, not easy :)
Please rebase your branch before you start to work on this :)
Well, that was an adventure! I think I found a fix. The tree-sitter test
run passes, and it would be interesting to see if the integration tests do to.
The diff for this change looks more imposing than it is because I extraced a large section of the scan
method into a helper method
, to be able to call it from one additional place in the scan
method. Let me know what you think! I have very little experience writing C++, so mistakes would not surprise me.
I wrote a more detailed description of the problem and fix in the commit message. I also included a commit that deletes some dead code (I think), but happy it if you don't want it!
This seems to be performing worse in some cases :(
Compare old -> https://github.com/elm-tooling/tree-sitter-elm/runs/4943345031?check_suite_focus=true with new -> https://github.com/elm-tooling/tree-sitter-elm/runs/4955506916?check_suite_focus=true
It only seemst to affect some ~70 files, but still
Oh, bummer! I'll take a look and see if I can figure out what's happening with those extra failures.
Is the running time of that step you linked at all significant? I noticed it was 11 minutes in the old version and 21 in the second, but I believe most of that time is spent cloning and not parsing, right?
I found and pushed a fix for one problem, but there might be more. Curious what the tests will say (such a nice suite by the way, lots of data!).
I got to say I much prefer working in the grammar.js file then I do the scanner.cc one :), and I feel a bit bad about adding complexity to the C++ code. Don't know if there's a better way.
It taking 11 minutes and sometimes more is just CI as far as I can tell
In general, if we can find ways to get rid of logic in the scanner.cc, we want that. grammar.js is vastly better. But as long as tree sitter need that for space aware languages, we're likely stuck.
It's already much simplified, compared to some time ago.
Okay, only seems to break parsing of https://github.com/prozacchiwawa/elm-keccak/blob/master/src/Keccak.elm which might be an okay trade off.
Oh, cool! I'm just going to take a look and that remaining failing test, just because it would be nice to do this without regressions, if possible!
The Keccak
example was interesting and made me better understand the original problem this PR set out to solve too: Line comment indentation doesn't matter in Elm, but the scanner code was considering it.
The latet commit comes at the problem from another direction, and this way the previous unit tests and the Keccak
example now pass. It's a bigger change though, so I might have broken some of the other integration tests :sweat_smile:.
In the latest code we skip over line comments entirely for the purposes of finding virtual end declaration and section nodes. Consequences:
Curious what you think!
No regressions left in this PR, best as I can tell!
Yeah, let's go with this and see if anything unwanted pops up
FYI, I'm seeing a few problems when integrating here https://github.com/elm-tooling/elm-language-server/pull/681
I'm not sure if it's to this or a different change.
Hi there! I just spotted this minor bug (I think?). I added a test to demonstrate it and want to look into a fix tomorrow, but thought I'd give an early heads-up in case this is an already known thing and/or not a bug!