Do-block syntax in Elixir confuses `show-smartparens-mode`

gausby commented 7 years ago

The Elixir syntax is a bit unforgiving because there are a couple of shorthands which currently confuses smartparens 'show pairs' functionality, such as:

Do-blocks can be written as do (expressions) end and , do: expression.

I don't know if there are a way to support this in the show-parens mode.

Fuco1 commented 7 years ago

Ouch :)

This should be no problem, if the syntax requires do: we can decide on the : if there is an associated end or not.

gausby commented 7 years ago

I think ignoring do:'s on pairs would suffice :)

There are also some problems with matching defp's. Could it be because of the order in this list: https://github.com/Fuco1/smartparens/blob/master/smartparens-elixir.el#L39-L52 ?—and should I open another issue?

gausby commented 7 years ago

Also…as I read the code: it will look for a do, go to the start of the line and look for one of the keywords.

Unfortunately, if this is the case: that assumption does not suffice: The do doesn't have to be on the same line as the keyword. An example could be this function defintion where I use guards:

  defp drop_tailing_bits(%__MODULE__{size: size} = bitfield)
  when rem(size, 8) == 0 do
    {:ok, bitfield}
  end

Am I correct in my understanding of the code, and should I open another issue for this one?

Fuco1 commented 7 years ago

Yep, it works as you say. I guess we don't need another issue, maybe you could open a PR with added failing tests displaying these cases so I can then fix the issues. What do you think?

gausby commented 7 years ago

I will look into it asap, but I am kinda busy until Sunday.

Thanks for a great package, btw :)

g-belmonte commented 5 years ago

@Fuco1 I'm fairly new to emacs and elisp, so I'm sorry if the question sounds dumb, but... wouldn't something like this solve the issue?

here: https://github.com/Fuco1/smartparens/blob/master/smartparens-elixir.el#L41 like this:

...
 (when (equal (rx "do[^:]") id)
...

Are there any issues using a regex in this place?

Fuco1 commented 5 years ago

I don't remember what the smartparens code does there, but we need to make sure this do: construct doesn't expect a corresponding end. Which the regular do does expect.

It might still work though. If you use elixir you can test it and if it works we can create a patch.

nicobao commented 4 years ago

Hi! This error still pops up in latest version of elixir-mode. Or at least, that's what I am experimenting. Is the issue supposed to be fixed already? (and thanks a lot for your work! not sure I know enough about Emacs Lisp to help but still...)

nicobao commented 4 years ago

Hi! Here is another example where similar errors occur in Elixir: with the quote do construct. defmodule_end defmodule_quote_do_end

If someone is ready to provide some initial guidance, I'd be happy to help.

Fuco1 commented 4 years ago

The quote do should already be handled by sp-elixir-def-p, so if that doesn't work there's likely a bug in that code.

As for the inline do:, can you provide some minimal valid elixir code where this causes a problem? I do not use elixir or ruby very much.

axelson commented 4 years ago

@Fuco1 Here is a minimal example with do:

defmodule ElixirExampleOne do
  def func_one do
    IO.puts("hi 1")
  end

  def func_two, do: IO.puts("hi 2")
end

The def for func_two ends up paired with the end on the last line which is incorrect.

Another related bit of syntax that is interpreted incorrectly is anonymous functions, in this example func_two is a function that returns an anonymous function. The end inside func_two should be paired with fn, but instead it is paired with the def of func_two:

defmodule ElixirExampleTwo do
  def func_one do
    IO.puts("hi 1")
  end

  def func_two do
    fn name -> IO.puts("hello " <> name) end
  end
end

The fn ... end could be solved separately or in a different PR, but I mention it since it is another incorrect pairing and it might be easier to solve both of them at the same time rather than one after another (I'm not familiar enough with the code to really know).

You can streamline the examples even further by removing func_one from each of them.

andreyorst commented 4 years ago

@Fuco1 I'm interested in working on this, as I experience this problem and it makes it a lot harder to navigate code. Any points towards where I should look at? E.g. Docs or maybe there are already languages supported with similar feature?

andreyorst commented 4 years ago

@axelson, sorry to disturb you, but it seems that I've fixed the do: issue, and fn ... end is already handled by Smartparens. However I would like to ask if often see linebreaks before do: or even do forms in the code, e.g.:

defmodule ElixirExampleOne do
  def func_one
  do
    IO.puts("hi 1")
  end

  def func_two,
    do: IO.puts("hi 2")
end

Because currently Smartparens is completely broken for this particular case and my PR does not yet fix this. The question is if this needs to be fixed, although I suppose this will be a llittle more performance hungry.

axelson commented 4 years ago

@andreyorst Thanks for taking a stab at this issue! Those two forms are indeed valid elixir code, although the first one is not seen very often, usually there is a when clause as a guard before the do. Here's a couple longer examples:

defmodule ElixirExampleTwo do
  def can_destroy_message(nil, _), do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{destroyed_at: destroyed_at}, _)
      when not is_nil(destroyed_at),
      do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{room_id: message_room_id}, %{room_id: member_room_id})
      when message_room_id != member_room_id,
      do: {:error, %{reason: "unauthorized"}}

  def can_destroy_message(%{content: content} = message, room_member)
      when not is_nil(content) and not is_nil(room_member) do
    # implementation
  end
end

The def being on a separate line from the do is rather common in Elixir code, so it would be great if it was well-handled.

andreyorst commented 4 years ago

Good news, I've managed to make this case work last night:

defmodule ElixirExampleTwo do
  def can_destroy_message(nil, _),
    do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{destroyed_at: destroyed_at}, _)
      when not is_nil(destroyed_at),
      do: {:error, %{reason: "not_found"}}

  def can_destroy_message(%{room_id: message_room_id}, %{room_id: member_room_id})
      when message_room_id != member_room_id,
      do: {:error, %{reason: "unauthorized"}}

  def can_destroy_message(%{content: content} = message, room_member)
      when not is_nil(content) and not is_nil(room_member) do
    # implementation
  end
end

However your example doesn't quite work. The difference is in the newline before first do:. I'm not sure if I can fix this.

andreyorst commented 4 years ago

@axelson if you can test implementation from #1058 on some more or less big codebases (I don't write in Elixir and don't know big projects except maybe the language server), and provide some more examples where it breaks it would be very helpful :)

andreyorst commented 4 years ago

Yay, I've fixed that bug with do: not being parsed when it is on the same line! Now all code I've seen so far is parsed correctly. Relevant changes are in #1058

axelson commented 4 years ago

Great! I'm having some trouble getting the git version of smartparens installed locally but I should be able to figure it out eventually.

andreyorst commented 4 years ago

Great! I'm having some trouble getting the git version of smartparens installed locally but I should be able to figure it out eventually.

If you use straight and use-package it's easy:

(use-package smartparens
  :straight (:host github :repo "andreyorst/smartparens" :branch "elixir-better-search"))

If not, I guess you can simply replace smartparens-elixir.el with code from my repo https://raw.githubusercontent.com/andreyorst/smartparens/elixir-better-search/smartparens-elixir.el and bytecompile it

andreyorst commented 4 years ago

I guess that the only quirk left out now is bodyless clauses. I'll try to handle it as well.

Also, I've noticed that without (modify-syntax-entry ?_ "w") Smartparens will think that end_ is also a valid pair. @Fuco1 is this intentional?

andreyorst commented 4 years ago

Uh oh. I've also noticed that this is a valid syntax:

defmodule Example do
  def foo(do: body) do
    body
  end
end

Example.foo do
  IO.puts("1")
  IO.puts("2")
end

Which currently breaks all heuristics I've introduced...

And given that this all can be minified to singlle line like this

defmodule Example do def foo(do: body) do body end end; Example.foo do IO.puts("1"); IO.puts("2") end

Making this extremely hard to find matching keywords with my approach. I guess that language with such syntax should use a proper parser instead, something like tree-sitter, to make structural navigation really reliable.

Edit: OK, I might be able to fix this, but I'm not sure if other things will not pop out

Fuco1 commented 4 years ago

I was thinking of using something like LSP for parsing but it might be too slow for interactive navigation. Maybe fetching entire file's AST or some partial AST around point would work, but it's probably a lot of work.

Once you include some test cases for the new work I'll be happy to merge this. Alternatively, if either of you wants to maintain the elixir extension, I'll give you push access.

andreyorst commented 4 years ago

I was thinking of using something like LSP for parsing but it might be too slow for interactive navigation.

@Fuco1 have you heard about https://tree-sitter.github.io/tree-sitter/? It is amazingly fast and provides all necessary info for structural navigation and manipulation. Albeit it is an external library that must be loaded separately, but there are works in Emacs for this https://github.com/ubolonton/emacs-tree-sitter.

Once you include some test cases for the new work I'll be happy to merge this. Alternatively, if either of you wants to maintain the elixir extension, I'll give you push access.

I don't think I'll go as far as Elixir extension maintainer, as I don't program in it :) At least rightnow. I've just decided to help by fixing the issue, and maybe some others that I have not yet looked at.

axelson commented 4 years ago

I am a maintainer of the Elixir Language Server, so I'm curious how we'd potentially leverage the language server from smartparens. Are there any examples from other languages? Also I'm not currently well-versed enough in elisp to feel comfortable writing elisp for a use-case like this.

@andreyorst with the ElixirExampleTwo above and your code (546274a) I'm not seeing the first do for the overall module matching with the end on the very last line. Hmm, actually it looks like if I hover over the defmodule then the final end is highlighted, but if I hover over the final end then the defmodule is not highlighted.

Also, I'm very curious about TreeSitter and would like to investigate that more in the future, although last time I checked it didn't seem quite ready to use, but perhaps that's changed by now.

andreyorst commented 4 years ago

if I hover over the final end then the defmodule is not highlighted.

Hm, that's weird. Here's how it works for me

![image](https://user-images.githubusercontent.com/19470159/101881611-b78c4d00-3ba5-11eb-8d62-5b590a862c27.png)

Here's a really good talk on Tree Sitter: https://www.youtube.com/watch?v=Jes3bD6P0To, and at 5:03 Tree Sitter's author explains why LSP should not be used for such features as syntax highlighting and structural navigation. In short - embedding several language servers into smartparens will almost turn smartparens into another LSP client, so perhaps, if a server provides such features as strucural navigation (I don't know any server that does that), it is better to make smartparens work with LSP-mode or eglot instead.

although last time I checked it didn't seem quite ready to use, but perhaps that's changed by now.

If you mean in Emacs, than maybe yeah. In Atom it is fairly useful since around 2018. Although support for new languages comes at a quite slow pace, but I guess that's because VSCode pushes LSP for highlighting syntax.

nicobao commented 4 years ago

if I hover over the final end then the defmodule is not highlighted.

Hm, that's weird. Here's how it works for me

Here's a really good talk on Tree Sitter: https://www.youtube.com/watch?v=Jes3bD6P0To, and at 5:03 Tree Sitter's author explains why LSP should not be used for such features as syntax highlighting and structural navigation. In short - embedding several language servers into smartparens will almost turn smartparens into another LSP client, so perhaps, if a server provides such features as strucural navigation (I don't know any server that does that), it is better to make smartparens work with LSP-mode or eglot instead.

The video you mentioned explains that tree-sitter is more adapted for Atom and GitHub because they deal with plenty of languages. As of smartparens, it's the same thing. But the question is: do all editors have an equivalent of smartparens? Besides, isn't it the point of an LSP to decouple IDE-capibilities from specific frontend? Finally, is providing excellent syntax highlighting without any mistakes for all the possible grammars really feasible without some errors here and there?

LSP specification now supports semantic tokens: https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

The LSP Server maintainers can implement it using tree-sitter or any other parser they prefer, and support syntax highlighting from there.

However - I do agree that syntax highlighting is a little special, because it's something you always want to work, and language server means I/O, which means errors. Nothing prevent lsp clients to keep in cache the latest syntax information though. Besides, it's better if syntax highlighting work similarly between multiple languages: it's confusing to have different theme for multiple languages so I am a little conflicted here. This is easier to deal with if you have one client-side project - like smartparens - to deal with syntax highlighting for all the languages.

At the end of the day, tree-sitter is just a parser, and LSP is just a protocol. Whether you use tree-sitter from LSP or directly on the client side is a matter of choice, but they are in no way incompatible.

axelson commented 3 years ago

if I hover over the final end then the defmodule is not highlighted.

Hm, that's weird. Here's how it works for me

Probably something specific to my setup (and likely related to something that Doom is adding). I haven't noticed any other issues so :+1: from me on merging the PR.

andreyorst commented 3 years ago

But the question is: do all editors have an equivalent of smartparens?

@nicobao Didn't seen that coming, but yeah, actually a lot of text editors have smartparens-like plugins. Don't really understand why this is relevant though.

is providing excellent syntax highlighting without any mistakes for all the possible grammars really feasible without some errors here and there?

Tree sitter was implemented with this in mind - to be able to parse correctly as many languages as possible, with a lot of ambiguity and a way to recover from error state while still providing useful syntax tree. This is mostly what Smartparens does, except it is much simpler than tree-sitter as it features small regexp based parser for each language it supports. And as we see in this issue, some languages may require a more complete parsing to work reliably.

Also, perhaps this was an analogy, but just to make sure - Smartparens does not provide syntax highlighting at all. Using syntax highlighting info from a language server just for querying the syntax tree seems overkill to me. Treesitter on the other hand provides just the syntax tree that can be analysed however you want to.

Anyway, I think this gets away from the main topic.

@axelson, @Fuco1 I was able to implement searchers that work for minified version of the code, where everything is in single line, but this actually makes parsing very slow. For files that contain about 100 files the delay between highlighing and jumping at the top level form is significant. I guess we can keep the line based search, that can't detect everything but still a bit better than what we have ATM?

axelson commented 3 years ago

@axelson, @Fuco1 I was able to implement searchers that work for minified version of the code, where everything is in single line, but this actually makes parsing very slow. For files that contain about 100 files the delay between highlighing and jumping at the top level form is significant. I guess we can keep the line based search, that can't detect everything but still a bit better than what we have ATM?

:+1: from me on that approach. I don't think that keeping everything in a single line is very common in elixir (and most people use the formatter which wraps at 100 characters).

andreyorst commented 3 years ago

@Fuco1 I've added tests, and worked out most edge cases I could reliably do. I consider this as the point of as far as I can go right now.

Fuco1 / smartparens

Do-block syntax in Elixir confuses `show-smartparens-mode` #707