Support some features in incomplete code

gomoripeti commented 3 years ago

Is your feature request related to a problem? Please describe. Incomplete code is currently unparsable, therefor most of the erlang_ls features don't work there. OTOH as Roberto put it "that's where developers spend most of their time on when writing code". We have to identify which features make sense and the most useful when editing code. Also we should investigate which features could be supported and how. This issue is spawned from #1035 , let's discuss in the comments.

Features and feasibility It is els_parser that creates POIs from source code. Many other features work on the POIs. So this part of the task is to create POIs from incomplete code and those features don't have to be modified.

Please suggest what are the other features that need additional support apart from POIs from incomplete code. I suspect some diagnostics are like that.

Btw is there a full list of features of erlang_ls that we can go through? (collected some from module names in els_lsp)

[ ] (lens) function references
[ ] (lens) behaviour usages
[ ] (lens) suggest spec (N/A)
[ ] (diagnostics) bound var in pattern
[ ] (diagnostics) run ct (N/A)
[ ] (diagnostcs) compiler (N/A)
[ ] (diagnostics) cross-ref (is this applicable?)
[ ] (diagnostics) dialyzer (N/A)
[ ] (diagnostics) elvis(N/A)
[ ] (diagnostics) unused includes (is this applicable?)
[ ] (diagnostics) unused macros (is this applicable?)
[ ] completions (lot of sub features) (already works on tokens, the list of completions and context is gathered from POIs though)
[ ] goto definition (lot of sub features)
[ ] hover/docs (sub features)
[ ] document highlight
[ ] document symbol
[ ] folding range
[ ] formatting (no formatter I know of supports incomplete code, incomplete code indentation is a popular request in the ecosystem)
[ ] goto implementation
[ ] references (lot of sub features)
[ ] rename
[ ] workspace symbol

Implementation details and alternatives

erlang_ls already looks at the scanned tokens to find certain attributes (export, export_type, spec). This could be extended to figure out as many POIs as possible. OTOH erlang_ls should not reinvent the parser. Basic ones (ongoing): variables, atoms maybe macros. Function calls and records are harder. Fortunately erlang_ls does not care about control structures like case or try-catch which would be the hardest to parse.
Alternatively it would be cool to have a re-entrant parser with a continuation that is not opaque, but I don't know of any for Erlang.
merl does clever things to wrap an expression into something that makes sense. I'm not sure but maybe similarly erlfmt can parse standalone expressions as forms and some weird things as define values.

robertoaloi commented 3 years ago

I enabled "discussions" for this repo, so we can move this there if we want (not a big deal). One thing I wanted to have a look is Tree Sitter. If you are not familiar with Tree Sitter, you can watch this video. There's a (partially broken) grammar for Erlang available here.

gomoripeti commented 3 years ago

very interesting, I wasn't aware of tree-sitter. Isn't it a bit of competing with LSP in that an editor with a generic tree-sitter plugin and a language specific grammar can do some things that a language server provides (folding ranges etc)? ... well not really, as tree-sitter works on a single file, so it cannot support most of the erlang_ls features that work on the project level.

robertoaloi commented 3 years ago

There's definitely an overlap between LSP and Tree Sitter and the video above briefly touches on that. What we should do is to try and understand what's a good way to combine the two worlds. Maybe other language servers have done this already and we can use some inspiration there?

vladdu commented 3 years ago

I've been away from the Erlang world the last few years, but I'm starting my vacation today and I'm planning to try to review my tools that have been dormant. One of them is sourcer, which can handle incomplete code at the price of having its own parser. The main idea is to split a file in places where we can guess a new form or form clause starts and process each chunk separately. This way, if we have for example an unterminated string in the middle of the file, it only affects the portion until the next function definition (at most). I think this idea can be applied here too.

I will go through the erlang_ls code, to see if it look like there might be ideas from sourcer that could be useful (if you haven't already checked long ago :-)).

robertoaloi commented 3 years ago

Hi @vladdu and welcome back!

The parser has recently been switched to the one from erlfmt. Feel free to have a look and see if there's anything from Sourcer which could be re-used here. Contributions welcome!

robertoaloi commented 3 years ago

@gomoripeti I wonder if, as a first step, we couldn't simply leverage the ErrorInfo returned by erlfmt to extract the location of the problematic expression(s). We could then remove those problematic expressions from the raw_string and somehow create a new "form". We could then parse the new form (eventually completed with a dot as usual, extracting as many POIs as possible (e.g. function applications). WDYT?

gomoripeti commented 3 years ago

I have some doubts how efficient this can be. But it's simple enough to try. It can for example eliminate the last incomplete expression for a function body. I wonder though if it wouldn't be surprising/unintuitive for the users which POIs are found and which not.

for example the below

els_parser:parse("f(VarA) -> VarB = g(), case h() of VarC -> Var").

could be parsed as

els_parser:parse("f(VarA) -> VarB = g().").

but the completion of Var would only list VarA and VarB but not VarC.

A token based heuristic could be more consistent. But it would require much more code and special handling (something similar to https://github.com/erlang-ls/erlang_ls/blob/main/apps/els_lsp/src/els_completion_provider.erl#L166-L223)

robertoaloi commented 3 years ago

That's exactly what I had in mind. I am not too worried about efficiency, since that would only affect a couple of forms (the ones under edit), but we could try and see how it goes. If in doubt, we could make this behaviour configurable via an option.

The way I see the whole thing is that right now parsing skips the whole form under editing. This way, we would improve things by only skipping one expression. Later on, we could use heuristics for the "missing" expression.

erlang-ls / erlang_ls

Support some features in incomplete code #1037