Open zachallaun opened 1 year ago
This would be a welcome change! I'd make the following changes though:
is_local_call
should be named is_unqualified_call
, as local or non-local is a runtime property and not necessarily a syntax thingis_var
should be named is_identifier
Also, what if instead of has_range?
we add a fetch_range
function with ok/error results? Or we could have both, I'm curious on the motivation for this one
is_local_call
should be namedis_unqualified_call
, as local or non-local is a runtime property and not necessarily a syntax thing
is_var
should be namedis_identifier
Good points! Agree with both of these.
Also, what if instead of
has_range?
we add afetch_range
function with ok/error results? Or we could have both, I'm curious on the motivation for this one
My real motivation for this is wanting to know, regardless of what node I have, whether I should be able to call get_range
on it.
My first thought was to actually suggest a change to get_range
: instead of raising when it can't calculate the range for a node, it should return nil
(same semantics as other get_*
functions). However, that has two problems that I could think of:
get_range
is defined recursively, it would be a non-trivial refactor to guard against nil
in all the recursive cases.Unfortunately, these issues also sort of apply to a fetch_range
-- it could rescue FunctionClauseError
, but that's pretty gross :P.
Given that, the next-best thing I could think of was a has_range?
function!
I see, though wouldn't the addition of has_range?
need a refactor regardless? For a bunch of nodes we need to recursively check the contents to calculate the range(if possible), so this function would need to do something similar.
Regardless, I think has_range?
can still be useful so I'm good with the addition :)
Hmm... Maybe so! Let's see what happens with the implementation and can decide then what the right API is 🙂
Sounds good!
Re: the Access
work, that will be built on existing sourceror primitives unless some optimization is required, so it's not something to worry about :)
I started working on this a bit, but realized that I need a precise definition for unqualified calls.
For context, here's some "prior art" from ElixirSense:
defguardp is_call(call, params)
when is_atom(call) and is_list(params) and
call not in [:., :__aliases__, :"::", :{}, :|>, :%, :%{}]
I think this generally makes sense, but the definition doesn't strike me as complete. For instance, it doesn't account for |
usage in map-update syntax (%{foo | bar: :baz}
), or the pin operator ^
, which can only be used in a match operation or macro.
So what makes something a "call" vs. a special form? I think the critical element that can be used to evaluate something is whether that thing is definable by users without modifying the compiler.
By that definition, the following would not be considered calls because they are reserved words or forms that cannot be overridden:
fn
, when
, and
, or
, not
, in
%
, %{}
, {}
, []
, <<>>
^
, .
, =
, &
, ::
, =>
, <-
, \\
, |
This is a much larger set than what ElixirSense is using.
Given this, here's a possible definition:
Definition 1
Calls are functions, macros, and operators that operate on arguments and are definable or overridable in Elixir code.
Does this make sense and is it a useful definition?
An alternative could be to define calls more broadly and to add an additional is_special_form
that identifies calls that, in essence, cannot be overridden.
Definition 2
Calls are functions, macros, operators, or special forms that operate on arguments.
If we accept this second definition, I'd think that :__aliases__
would be a call (and special form), while :__block__
would be neither.
Am I overthinking this? 😅
@zachallaun those are good calls, but from my point of view Sourceror works at a lower level A "call" vs "special form" is a compiler and runtime thing, but Sourceror works at the syntax level, where those semantics don't matter
As far as we are concerned, with foo <- bar do ... end
could be a function, a macro, or none at all
In Elixir AST you can boil it down to two scenarios:
You have AST representing a node with arguments: {type, metadata, arguments}
where type
is an atom like :foo
This is a call
, and includes operators, constructors like %
or %{}
, etc
In some cases, a call type
is another call instead: {{:., dot_metadata, dot_arguments}, metadata, arguments}
Those are qualified calls
, and represent the foo.bar
construct, where the .
is what makes it "qualified"
This is the case for foo.bar
, Foo.bar
, and even Foo.{Bar}
which is a "qualified tuple" but a call nonetheless
So from Sourceror's point of view, a call
or unqualified_call
is any 3-tuple node where the type
is an atom and the arguments
a list, a qualified call
is the same but with a dot call as the type
Now, from the point of view of a tool like ElixirSense it does make more sense to narrow down the definition because it's dealing with a higher level where the language semantics do matter.
I do welcome functions that deal with a higher level for the sake of making it easier to navigate the AST considering the semantics, maybe in a separate module under Sourceror
that is aware of default elixir semantics, but functions in the Sourceror
module itself deal with the syntax only and assume al syntax is meaningless, as in fn
or when
may not be reserved in some context like macros.
An useful thing however may be to exclude :__block__
from any definition of call
because those are workarounds to represent other syntax constructs, this would match your second definition
I'm torn on :__aliases__
though, because it doesn't accept other AST as arguments and it's pretty much another hack to make it easy to extract the alias segments
Thanks for sharing your thoughts! This largely makes sense. I'm interested in identifying higher level semantics, but agree that it makes sense in a separate module. The existing Identifier module kinda does this, right? Perhaps it makes sense in there?
I think :__aliases__
is not a call because it's not transformative in any way -- as you mentioned, it's basically a syntax hack to represent a slightly more complex, but still "atomic", element.
Actually now that you mention this, Sourceror.Identifier
would be a great place for the functions in this proposal. That module started as a bag for vendored helpers of Code.Normalizer
and I figured I could make it public
I do agree with you that :__aliases__
is not a call, so let's put it out of the notion of a "call" at the syntax level
I think the next step for semantics aware functions would be to come up with a name/place for such functions, maybe Sourceror.Code
? And we rename the existing and private module with that name to something like Sourceror.Compat.Code
We can't have environment aware functions like ElixirSense may have, but we definitely can provide functions that operate on AST informed by vanilla elixir semantics
Sounds good!
Another thing we need is a good definition for are literals.
I can imagine two versions, one that's lower level and lives in Sourceror.Identifier
and one that's higher level and lives in the new Sourceror.Code
.
The lower-level would be is_literal
, and it returns true for AST literals. This has a narrow definition: numbers, atoms, strings, or :__block__
nodes that contain only one child -- a number, atom, or string.
The higher-level definition would be data?
and would live in Sourceror.Code
. This would be similar to Macro.quoted_literal?/1
, with the exception that, like is_literal
, it considers :__block__
nodes containing a single literal to also be literals.
Thoughts?
I'm actually thinking these should be called:
Sourceror.Identifier.is_atomic_literal/1
Sourceror.Code.quoted_literal?/1
Regarding moving the existing Sourceror.Code
-- Formatter
and Normalizer
already is in a lib_vendored
. What if we make them all Sourceror.Vendored.*
? So:
Sourceror.Vendored.Code
Sourceror.Vendored.Code.Formatter
Sourceror.Vendored.Code.Normalizer
That sounds like a good change, so yes we can do that!
I've recently found myself writing a handful of functions and guards that I think would be useful additions to Sourceror.
The motivation for this is to have a handful of primitives that allow you to differentiate between and navigate the nodes of an AST parsed by Sourceror (and to a lesser extent, the default
Code.string_to_quoted
options). This is based on an assumption that the fundamental "categories" of nodes in an Elixir AST are::__block__
in order to retain additional metadataProposed guards:
is_local_call(node)
- local calls likefoo()
,1 + 1
, etc. Excludes special nodes like:__block__
,:__aliases__
, or data structures like:%{}
.is_qualified_call(node)
- similar to above but for qualified/dot calls,Foo.bar()
,foo.()
, etc.is_call(node)
- helper for local or qualified callis_var(node)
- identifies variablesis_literal(node)
- literals in the AST and wrapped{:__block__, _, [literal]}
nodes. (A block containing multiple literals is not a literal. I think this distinction is very useful when navigating AST.)Proposed functions:
fetch_literal(node)
- returns{:ok, value} | :error
- these would return the same thing:fetch_literal({:__block__, _, [:foo]})
/fetch_literal(:foo)
fetch_children(node)
- returns{:ok, [child]} | :error
(removed in favor of #107)has_range?
- iftrue
, node can be passed toget_range
without raisingbranch?(tree)
andchildren(tree)
, eventually make private?Additional considerations:
Some of this may be related to the
Access
work you've been doing. Can it be unified?