doorgan / sourceror

Utilities to manipulate Elixir source code
Apache License 2.0
317 stars 22 forks source link

Change the AST to make it easier to pattern match and remove ambiguities #24

Open doorgan opened 3 years ago

doorgan commented 3 years ago

This is derived from a discussion at #23. The tl;dr is that wrapping literals in :__block__ works, but makes it cumbersome to pattern match against. Some other nodes could be changed as well, like sigils.

The changes to be made would be such that the following syntaxes would be mapped as follows: foo -> {:var, metadata, :foo} 400_000 -> {:int, [token: "400_000" | metadata], 400000} 42.0 -> {:float, [token: "42.0" | metadata], 42.0} :foo -> {:atom, metadata, :foo} [1, 2] -> {[], metadata, [1, 2]} {1, 2} -> {:{}, metadata, [1. 2]} ~w[a b c] -> {{:sigil, "w"}, metadata, [segments, modifiers]}

{:sigil, meta, ["w", segments, modifiers]} for sigils would be ambiguous with function calls, for example. ~w/foo #{:bar}/a and foo("w", "foo #{:bar}", 'a') already produce the very same AST, with the only exception that the sigil has a :delimiter metadata that lets us recognize it as an actual sigil. Because AST nodes metadata order is not guaranteed, we can't just pattern match the metadata to assert it's a sigil. Hence the idea of using a tuple as the node type instead.

The same goes for lists. If the list node is {:list, meta, elements}, then [1, 2, 3] and list(1, 2, 3) would produce exactly the same AST, so I was thinking on using {[], meta, elements} instead. I'm not very happy with the loss in readability there, but it would work.

There are some opportunities to enrich the AST further, for example, the AST for A.B.C.fun is this:

{{:., _, [{:__aliases__, [], [:A, :B, :C], :fun]}, _, []}

By leveraging the :static_atoms_encoder parser option, we could transform it to this:

{{:., _, [{:__aliases__, [], [
  {:atom, _, :A}, {:atom, _, :B}, {:atom, _, :C}
], {:atom, _, :fun}]}, _, []}

Which, of course, it a lot more verbose, but on the other hand it allows us to: a) Have a more consistent AST, meaning we no longer have some places were atoms are wrapped and others were they're not b) We can have more precise information about where exactly each alias segment is happening

For b we would still miss information about where exactly the dots are located, but at least we could accurately replace the inner segments.

The final change I had thought of is to make the parsing safe by default, meaning that parsing a file wont create new atoms. The downside is that the Formatter only works with atoms, so Sourceror.to_string would create new atoms anyways and we're back to square one.

doorgan commented 3 years ago

Some side effects of this:

All of these are easily addressable.

doorgan commented 3 years ago

You can try these changes(with the exception of the static_atoms_encoder one) in the rich_ast branch:

{:sourceror, github: "doorgan/sourceror", branch: "rich_ast"}

CC @marcandre

doorgan commented 3 years ago

Sigils were changed to {:"~", metadata, [letter, contents, modifiers]}

msaraiva commented 3 years ago

@doorgan would it be wiser to wait for these changes to be applied before using sourceror? I have a couple of mix tasks I want to add to Surface that could make good use of sourceror but I believe I may have to change the implementation a lot when the new AST comes out.

doorgan commented 3 years ago

Hi @msaraiva, I don't see these changes applied to the main branch anytime soon, as it's still too experimental and I realize it will take quite some time to find something that works fine for Sourceror use cases.

When the time to merge them comes closer I'll let you know and I'll be happy to help you migrate the Surface scripts :)

msaraiva commented 3 years ago

@doorgan sounds good!

I didn't know these changes were that far away. I'll give v0.8.1 a try then.

Cheers.

doorgan commented 3 years ago

@msaraiva I initially intended to apply them in 0.9, but I started to deviate too much from the original goal and I want to have a clearer spec first. There's lot of nuances when it comes to having completely unambiguous node types and keeping the 3-tuple shape that I need to resolve 🙂. So maybe they will land in 0.10 or 0.11.