Add support for multiline strings

theothornhill commented 3 years ago

There is now a new type of support for multiline strings in CC Mode, and this is a first attempt to support this. The regexes are supplied from Alan Mackenzie, the maintainer of CC Mode. There are probably improvements to be made here, but it seems to fix some slowness with regards to string handling. How does it work, you ask?

It is left as an exercise to the reader.

@josteink, this should address #164, #250, #207, #182

josteink commented 3 years ago

Good grief. You had time for to look into this. Great stuff! I guess that means we're still not aiming for a 100% tree-sitter based future? :smile:

Anyway: I'll honestly did my best best to review this (and then first and foremost from a https://github.com/emacs-csharp/csharp-mode/issues/250 perspective).

While the patch is pretty simple and no "big" thing of its own, it's all regexes and Emacs regexes are even worse to read than regular ones :joy:

Would it be possible to split those regexes into smaller more logically self-contained unit which then get concat-ed together or perhaps through use of the rx maxros instead? That would IMO make it more readable ... and reviewable :nerd_face:

theothornhill commented 3 years ago

Good grief. You had time for to look into this. Great stuff! I guess that means we're still not aiming for a 100% tree-sitter based future? 😄

Hehe, this is the extent of my hacking the last couple of months. I think we should be good cc mode citizens for the time being, considering tree sitter is not in core yet. (and the fact that I forced Alan to do this in the first place :))

Anyway: I'll honestly did my best best to review this (and then first and foremost from a #250 perspective).

While the patch is pretty simple and no "big" thing of its own, it's all regexes and Emacs regexes are even worse to read than regular ones 😂

Wait what, are you saying my code is hard to read?

Would it be possible to split those regexes into smaller more logically self-contained unit which then get concat-ed together or perhaps through use of the rx maxros instead? That would IMO make it more readable ... and reviewable 🤓

If only there was a library that could convert a regex to an rx macro. Oh, wait, there is! I added another commit for good measure. What do you think?:)

josteink commented 3 years ago

Wait what, are you saying my code is hard to read?

No comment :eyes:

If only there was a library that could convert a regex to an rx macro. Oh, wait, there is!

Well, TIL!

What do you think?:)

That's immediately much more readable... But still... And I guess I mentioned to ask this the first time around, but forgot (sorry about that!) ...

Is there any way to add unit-tests for this? Like (when (emacs-version>= 29) (progn ;; some multi-line test-cases) ?

That would really help contextualize the regex'es too, hopefully making it super-obvious that the correct code is indeed correct :smile:

theothornhill commented 3 years ago

Wait what, are you saying my code is hard to read?

No comment 👀

😄

Ok, so now the tests are passing.

That's immediately much more readable... But still... And I guess I mentioned to ask this the first time around, but forgot (sorry about that!) ...

The problem with these regexes is that intimate knowledge of how cc mode is supposed to read the matchers etc is needed. So to debug this one has to read the cc-engine.el code and friends. And I guess that if you're hacking on csharp-mode you are using emacs, and as such have access to the docstrings using the help functions. Though I really have to admit, I got these regexes by mail from Alan, and don't really understand them, as they use some internal concepts from cc mode, such as "contexts".

But anyway:

I commented out a test that I'm not sure is needed. The new CC Mode doesn't behave like that, and not really sure it is a bug. Lsp servers mark these strings anyways.
Is the new test good enough? Right now it is simply checking that the string doesn't bleed over its bounds.
It is not currently running on ci, since emacs 29 isn't released yet
This new behavior should be avaiable for all emacs variants supported by cc mode if the new variant not shipped with that emacs is installed
The new test is running on my machine... 🌵

By the way, I see this as mostly a performance optimization, and not really the long term solution. I think there are bugs in both the new implementation and the regexes, but I cannot for the life of me seem to solve them. And when core tree sitter arrives all this will be for nothing anyways :) I think this should be helpful for some people down the line :D

josteink commented 3 years ago

Great job working up the test-cases and scoping it down to the right cc-mode version. And sorry for being such a pedantic prick about it :smile:

Now I feel a little more comfortable reviewing the PR, and being able to browse back and forth between the regex forms (good job with the comment above!) really helps thinking more conceptually above it.

Based on that, I have a small comment/question, but otherwise: really phenomenal job putting this all together. Take a clap or two. It's all yours :clap: :clap:

theothornhill commented 3 years ago

Great job working up the test-cases and scoping it down to the right cc-mode version. And sorry for being such a pedantic prick about it 😄

No worries! It's only helpful, and I can get sloppy and trigger-happy, so nice to get some well-meaning advice!

Now I feel a little more comfortable reviewing the PR, and being able to browse back and forth between the regex forms (good job with the comment above!) really helps thinking more conceptually above it.

Great! Yeah, I think this is one of the more complex things in this mode, as it is kind of a closed box as to why the regex looks the way it does.

Based on that, I have a small comment/question, but otherwise: really phenomenal job putting this all together. Take a clap or two. It's all yours 👏 👏

Great! Thanks - I'll merge this then :) Thank you for your patience, and hopefully this will serve some people well!

Cue the 🐛 !

emacs-csharp / csharp-mode

Add support for multiline strings #251