Line comment beneath case statement ends up in case statement node

jwoudenberg commented 2 years ago

Hi there! I just spotted this minor bug (I think?). I added a test to demonstrate it and want to look into a fix tomorrow, but thought I'd give an early heads-up in case this is an already known thing and/or not a bug!

razzeee commented 2 years ago

I think I'm aware that this can happen, but not sure if I should warn you that this fix might be, not easy :)

razzeee commented 2 years ago

Please rebase your branch before you start to work on this :)

jwoudenberg commented 2 years ago

Well, that was an adventure! I think I found a fix. The tree-sitter test run passes, and it would be interesting to see if the integration tests do to.

The diff for this change looks more imposing than it is because I extraced a large section of the scan method into a helper method, to be able to call it from one additional place in the scan method. Let me know what you think! I have very little experience writing C++, so mistakes would not surprise me.

I wrote a more detailed description of the problem and fix in the commit message. I also included a commit that deletes some dead code (I think), but happy it if you don't want it!

razzeee commented 2 years ago

This seems to be performing worse in some cases :(

Compare old -> https://github.com/elm-tooling/tree-sitter-elm/runs/4943345031?check_suite_focus=true with new -> https://github.com/elm-tooling/tree-sitter-elm/runs/4955506916?check_suite_focus=true

It only seemst to affect some ~70 files, but still

jwoudenberg commented 2 years ago

Oh, bummer! I'll take a look and see if I can figure out what's happening with those extra failures.

Is the running time of that step you linked at all significant? I noticed it was 11 minutes in the old version and 21 in the second, but I believe most of that time is spent cloning and not parsing, right?

jwoudenberg commented 2 years ago

I found and pushed a fix for one problem, but there might be more. Curious what the tests will say (such a nice suite by the way, lots of data!).

I got to say I much prefer working in the grammar.js file then I do the scanner.cc one :), and I feel a bit bad about adding complexity to the C++ code. Don't know if there's a better way.

razzeee commented 2 years ago

It taking 11 minutes and sometimes more is just CI as far as I can tell

razzeee commented 2 years ago

In general, if we can find ways to get rid of logic in the scanner.cc, we want that. grammar.js is vastly better. But as long as tree sitter need that for space aware languages, we're likely stuck.

It's already much simplified, compared to some time ago.

razzeee commented 2 years ago

Okay, only seems to break parsing of https://github.com/prozacchiwawa/elm-keccak/blob/master/src/Keccak.elm which might be an okay trade off.

jwoudenberg commented 2 years ago

Oh, cool! I'm just going to take a look and that remaining failing test, just because it would be nice to do this without regressions, if possible!

jwoudenberg commented 2 years ago

The Keccak example was interesting and made me better understand the original problem this PR set out to solve too: Line comment indentation doesn't matter in Elm, but the scanner code was considering it.

The latet commit comes at the problem from another direction, and this way the previous unit tests and the Keccak example now pass. It's a bigger change though, so I might have broken some of the other integration tests :sweat_smile:.

In the latest code we skip over line comments entirely for the purposes of finding virtual end declaration and section nodes. Consequences:

A couple of unit tests now parse code slightly differently, i.e. put trailing line comments on the top level rather than the case statement level. Based on the comments in the tests it sounded to me that this was a step forward rather than back, in that the behavior is now aligned with elm-format.
Every time we're looking for a virtual end we're scanning through comments directly behind it (if any) twice. Once to find the virtual end, then again to parse the comment itself. I don't know how bad this is and/or how we might avoid this.

Curious what you think!

jwoudenberg commented 2 years ago

No regressions left in this PR, best as I can tell!

razzeee commented 2 years ago

Yeah, let's go with this and see if anything unwanted pops up

razzeee commented 2 years ago

FYI, I'm seeing a few problems when integrating here https://github.com/elm-tooling/elm-language-server/pull/681

I'm not sure if it's to this or a different change.

elm-tooling / tree-sitter-elm

Line comment beneath case statement ends up in case statement node #115