elm-tooling / tree-sitter-elm

Tree sitter implementation for elm
https://elm-tooling.github.io/tree-sitter-elm/
MIT License
74 stars 12 forks source link

Line comment beneath case statement ends up in case statement node #115

Closed jwoudenberg closed 2 years ago

jwoudenberg commented 2 years ago

Hi there! I just spotted this minor bug (I think?). I added a test to demonstrate it and want to look into a fix tomorrow, but thought I'd give an early heads-up in case this is an already known thing and/or not a bug!

razzeee commented 2 years ago

I think I'm aware that this can happen, but not sure if I should warn you that this fix might be, not easy :)

razzeee commented 2 years ago

Please rebase your branch before you start to work on this :)

jwoudenberg commented 2 years ago

Well, that was an adventure! I think I found a fix. The tree-sitter test run passes, and it would be interesting to see if the integration tests do to.

The diff for this change looks more imposing than it is because I extraced a large section of the scan method into a helper method, to be able to call it from one additional place in the scan method. Let me know what you think! I have very little experience writing C++, so mistakes would not surprise me.

I wrote a more detailed description of the problem and fix in the commit message. I also included a commit that deletes some dead code (I think), but happy it if you don't want it!

razzeee commented 2 years ago

This seems to be performing worse in some cases :(

Compare old -> https://github.com/elm-tooling/tree-sitter-elm/runs/4943345031?check_suite_focus=true with new -> https://github.com/elm-tooling/tree-sitter-elm/runs/4955506916?check_suite_focus=true

It only seemst to affect some ~70 files, but still

jwoudenberg commented 2 years ago

Oh, bummer! I'll take a look and see if I can figure out what's happening with those extra failures.

Is the running time of that step you linked at all significant? I noticed it was 11 minutes in the old version and 21 in the second, but I believe most of that time is spent cloning and not parsing, right?

jwoudenberg commented 2 years ago

I found and pushed a fix for one problem, but there might be more. Curious what the tests will say (such a nice suite by the way, lots of data!).

I got to say I much prefer working in the grammar.js file then I do the scanner.cc one :), and I feel a bit bad about adding complexity to the C++ code. Don't know if there's a better way.

razzeee commented 2 years ago

It taking 11 minutes and sometimes more is just CI as far as I can tell

razzeee commented 2 years ago

In general, if we can find ways to get rid of logic in the scanner.cc, we want that. grammar.js is vastly better. But as long as tree sitter need that for space aware languages, we're likely stuck.

It's already much simplified, compared to some time ago.

razzeee commented 2 years ago

Okay, only seems to break parsing of https://github.com/prozacchiwawa/elm-keccak/blob/master/src/Keccak.elm which might be an okay trade off.

jwoudenberg commented 2 years ago

Oh, cool! I'm just going to take a look and that remaining failing test, just because it would be nice to do this without regressions, if possible!

jwoudenberg commented 2 years ago

The Keccak example was interesting and made me better understand the original problem this PR set out to solve too: Line comment indentation doesn't matter in Elm, but the scanner code was considering it.

The latet commit comes at the problem from another direction, and this way the previous unit tests and the Keccak example now pass. It's a bigger change though, so I might have broken some of the other integration tests :sweat_smile:.

In the latest code we skip over line comments entirely for the purposes of finding virtual end declaration and section nodes. Consequences:

Curious what you think!

jwoudenberg commented 2 years ago

No regressions left in this PR, best as I can tell!

razzeee commented 2 years ago

Yeah, let's go with this and see if anything unwanted pops up

razzeee commented 2 years ago

FYI, I'm seeing a few problems when integrating here https://github.com/elm-tooling/elm-language-server/pull/681

I'm not sure if it's to this or a different change.