Use built-in function to convert source code to AST

sheldak commented 8 months ago

@lukaszsamson As we discussed in #1057, I made PR with changing ElixirSense.string_to_quoted to Code.string_to_quoted_with_comments. I will go through the pros and cons of that approach as it introduced some challenges for me.

Pros

Code.string_to_quoted_with_comments handles "do end" and single-line ", do:" so there is no need to fix that after converting back to a string.
Only correct code is changed to AST, so ElixirSense code correction limitations do not apply anymore (like the example with case)

Cons

The main difference is that we are no longer able to parse only one line of code as it may not be correct. Therefore, I decided to pass the whole source code file.

Since the exact format is not preserved in the structure returned by Code.string_to_quoted_with_comments, by updating AST according to a code action and transforming it back to a string, we not only get the text edits to perform the code action but also text edits that format the whole file. I don't think it is what we would expect from a code action, so I tried to limit any change to just the single relevant line.

For that I work with three strings here:

unformatted (original) source code text
formatted text
updated text (formatted with code action applied)

It is hard to compare unformatted text and updated text, because of the challenge of distinguishing between format text edits and code action text edits. Also, through Elixir warning we get only the line number in unformatted text. Thus, to extract only the relevant line, I format the text, compare the formatted text and the updated text, and check which line contains a change.

The problem appears when the formatting happens also in the relevant line. Then, I'm not creating any code action at all. Having formatting and the update in the relevant line may not always work. An example is the following code:

x = 3 # TODO

Formatting the inline comment results in putting it above the line. So the formatting applies in two lines of the resulting formatted text and we cannot just take the relevant line, because the comment would disappear. It is even more complicated for unformatted multiline maps.

In the implementation with ElixirSense.string_to_quoted after applying the code action we get the line formatted and updated. It does not change the other parts of the code, because we only operate on one line. Also, leading indent and trailing comments are handled separately.

Therefore, since I didn't find a reliable way to handle all cases of formatted lines, in the following implementation (with Code.string_to_quoted_with_comments) no change is performed if it will result in formatting the relevant line (that is limiting in comparison to the implementation with ElixirSense.string_to_quoted).

The solutions here are not ideal, so I'm open to feedback and some ideas how we should handle the formatting.

lukaszsamson commented 7 months ago

Thanks for looking into this. I agree that this approach is not clearly better and I'm hesitant with merging this.

sheldak commented 7 months ago

Thanks for looking into this. I agree that this approach is not clearly better and I'm hesitant with merging this.

@lukaszsamson Should I try some other approach then? For example, Sourceror library is working well in Lexical, so it may also work here.

lukaszsamson commented 7 months ago

Other contributors already asked for introducing Sourceror. I don't have anything against, only one question. Isn't it relying on Code.string_to_quoted_with_comments underneath? Wouldn't that make it have the same limitations?

We'd also need to vendor it to avoid conflicts with client libs

sheldak commented 7 months ago

Other contributors already asked for introducing Sourceror. I don't have anything against, only one question. Isn't it relying on Code.string_to_quoted_with_comments underneath? Wouldn't that make it have the same limitations?

@lukaszsamson It is indeed relying on Code.string_to_quoted_with_comments. However, it seems that by using patches (example from README) or also Zipper with helper functions (example from Lexical) we can keep the original formatting with relatively little additional code. I just don't think I could reimplement handling the formatting reliably in a reasonable timeline.

elixir-lsp / elixir-ls

Use built-in function to convert source code to AST #1078

Pros

Cons