JuliaLowering.jl is an experimental port of Julia's code lowering compiler passes, written in Julia itself. "Code lowering" is the set of compiler passes which symbolically transform and simplify Julia's syntax prior to type inference.
This work is intended to
esc()
!)Note this is a very early work in progress; most things probably don't work!
include("test/demo.jl")
Lowering has five symbolic simplification passes:
Want something something better than JuliaSyntax.SyntaxNode
! SyntaxTree
and
SyntaxGraph
provide this. (These will probably end up in JuliaSyntax
.)
We want to allow arbitrary attributes to be attached to tree nodes by analysis passes. This separates the analysis pass implementation from the data structure, allowing passes which don't know about each other to act on a shared data structure.
Design and implementation inspiration comes in several analogies:
Analogy 1: the ECS (Entity-Component-System) pattern for computer game design. This pattern is highly successful because it separates game logic (systems) from game objects (entities) by providing flexible storage
Analogy 2: The AoS to SoA transformation. But here we've got a kind of tree-of-structs-with-optional-attributes to struct-of-Dicts transformation. The data alignment / packing efficiency and concrete type safe storage benefits are similar.
Analogy 3: Graph algorithms which represent graphs as a compact array of node ids and edges with integer indices, rather than using a linked data structure.
Expression provenance is tracked through lowering by attaching provenance
information in the source
attribute to every expression as it is generated.
For example when parsing a source file we have
julia> ex = parsestmt(SyntaxTree, "a + b", filename="foo.jl")
SyntaxTree with attributes kind,value,name_val,syntax_flags,source
[call-i] │
a │
+ │
b │
julia> ex[3].source
a + b
# ╙ ── these are the bytes you're looking for 😊
The provenance
function should be used to look up the source
attribute and
the showprov
function used to inspect the content (this is preferred because
the encoding of source
is an implementation detail). For example:
julia> showprov(ex[3])
a + b
# ╙ ── in source
# @ foo.jl:1
During macro expansion and lowering provenance gets more complicated because an expression can arise from multiple sources. For example, we want to keep track of the entire stack of macro expansions an expression was generated by, while also recording where it occurred in the original source file.
For this, we use a tree data structure. Let's look at the following pair of macros
julia> JuliaLowering.include_string(Main, raw"""
module M
macro inner()
:(2)
end
macro outer()
:((1, @inner))
end
end
""", "some_macros.jl")
The tree which arises from macro expanding this is pretty simple:
julia> expanded = JuliaLowering.macroexpand(Main, parsestmt(SyntaxTree, "M.@outer()"))
SyntaxTree with attributes scope_layer,kind,value,var_id,name_val,syntax_flags,source
[tuple-p] │
1 │
2 │
but the provenance information recorded for the second element 2
of this
tuple is not trivial; it includes the macro call expressions for @inner
and
@outer
. We can show this in tree form:
julia> showprov(expanded[2], tree=true)
2
├─ 2
│ └─ @ some_macros.jl:3
└─ (macrocall @inner)
├─ (macrocall @inner)
│ └─ @ some_macros.jl:7
└─ (macrocall-p (. M @outer))
└─ @ foo.jl:1
or as a more human readable flattened list highlighting of source ranges:
module M
macro inner()
:(2)
# ╙ ── in source
end
# @ some_macros.jl:3
macro outer()
:((1, @inner))
# └────┘ ── in macro expansion
end
end
# @ some_macros.jl:7
M.@outer()
└────────┘ ── in macro expansion
# @ foo.jl:1
To write correct hygienic macros in Julia (as of 2024), macro authors must use
esc()
on any any syntax passed to the macro so that passed identifiers escape
to the macro caller scope. However
esc()
is one of the things
that new macro authors find most confusing. (My impression, based on various
people complaining about how confusing esc()
is.)esc()
wraps expressions in Expr(:escape)
, but this doesn't work well when
macros pass such escaped syntax to an inner macro call. As discussed in
Julia issue #37691, macros
in Julia's existing system are not composable by default. Writing
composable macros in the existing system would require preserving the escape
nesting depth when recursing into any macro argument nested expressions.
Almost no macro author knows how to do this and is prepared to pay for the
complexity of getting it right.The requirement to use esc()
stems from Julia's pervasive use of the simple
Expr
data structure which represents a unadorned AST in which names are plain
symbols. For example, a macro call @foo x
gets passed the symbol :x
which is just a name without any information attached to indicate that it came
from the scope where @foo
was called.
In JuliaLowering we make hygiene automatic and remove esc()
by combining names
with scope information. In the language of the paper Towards the Essence of
Hygiene
by Michael Adams, this combination is called a "syntax object". In
JuliaLowering our representation is the tuple (name,scope_layer)
, also called
VarId
in the scope resolution pass.
JuliaLowering's macro expander attaches a unique scope layer to each identifier in a piece of syntax. A "scope layer" is an integer identifer combined with the module in which the syntax was created.
When expanding macros,
Subsequently, the (name,scope_layer)
pairs are used when resolving bindings.
This ensures that, by default, we satisfy the basic rules for hygenic macros
discussed in Adams' paper:
TODO: Write more here...
macroexpand(m::Module, x)
calls jl_macroexpand
in ast.c:
jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule)
{
expr = jl_copy_ast(expr);
expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0);
expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule);
return expr;
}
First we copy the AST here. This is mostly a trivial deep copy of Expr
s and
shallow copy of their non-Expr
children, except for when they contain
embedded CodeInfo/phi/phic
nodes which are also deep copied.
Second we expand macros recursively by calling
jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)
This relies on state indexed by inmodule
and world
, which gives it some
funny properties:
module
expressions can't be expanded: macro expansion depends on macro
lookup within the module, but we can't do that without eval
.Expansion proceeds from the outermost to innermost macros. So macros see any
macro calls or quasiquote (quote/$
) in their children as unexpanded forms.
Things which are expanded:
quote
is expanded using flisp code in julia-bq-macro
QuoteNode
(inert)$
expands to its contentx
without $
expand to (copyast (inert x))
$
expand to a call to _expr
with all the
args mapped through julia-bq-expand-
. Roughly!quote quote $$(x...) end end
macrocall
proceeds with
jl_invoke_julia_macro
eval
on the macro name (!!) to get the macro function. Look up
the method.LoadError
LineNumberNode
where
the macro was invoked in the source.(hygienic-scope ,result ,newctx.m ,lineinfo)
(except
for special case optimizations)hygenic-scope
expands args[1]
with jl_expand_macros
, with the module
of expansion set to args[2]
. Ie, it's the Expr
representation of the
module and expression arguments to macroexpand
. The way this returns
either hygenic-scope
or unwraps is a bit confusing.do
macrocalls" have their own special handling because the macrocall is
the child of the do
. This seems like a mess!!Scopes are documented in the Juila documentation on Scope of Variables
This pass disambiguates variables which have the same name in different scopes and fills in the list of local variables within each lambda.
As scope is a collection of variable names by category:
argument
- arguments to a lambdalocal
- variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-defglobal
- variables declared global (in lambdas) or implicitly global (at top level)static-parameter
- lambda type arguments from where
clausesWe traverse the AST starting at the root paying attention to certian nodes:
outerref
local
, local-def
softscope
, hardscope
- just removedlambda
creates a new scope containing itself and its arguments,
otherwise copying the parent scope. It resolves the body with that new scope.scope-block
is really complicated - see belowislocal
, locals
islocal
- statically expand to true/false based on whether var name is a local varlocals
- return list of locals - see @locals
require-existing-local
- somewhat like islocal
, but allows globals
too (whaa?! naming) and produces a lowering error immediately if variable
is not known. Should be called require-in-scope
??break-block
, symbolicgoto
, symboliclabel
need special handling because
one of their arguments is a non-quoted symbol.with-static-parameters
method
- special handling for static paramsscope-block
is the complicated bit. It's processed by
local
, local-def
,
global
and assigned vars. Searching doesn't recurse into lambda
,
scope-block
, module
and toplevel
local-def
- flisp code explains this as
There's also this comment in https://github.com/JuliaLang/julia/issues/22314:
mark the [...] variable as local-def, which would prevent it from getting Core.Boxed during the closure conversion it'll be detected as known-SSA
But maybe that's confusing. It seems like local-def
is a local which lowering
asserts is "always defined" / "definitely initialized before use". But it's not
necessarily single-assign, so not SSA.
See https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form
mutable struct CodeInfo
code::Vector{Any} # IR statements
codelocs::Vector{Int32} # `length(code)` Vector of indices into `linetable`
ssavaluetypes::Any # `length(code)` or Vector of inferred types after opt
ssaflags::Vector{UInt32} # flag for every statement in `code`
# 0 if meta statement
# inbounds_flag - 1 bit (LSB)
# inline_flag - 1 bit
# noinline_flag - 1 bit
# ... other 8 flags which are defined in compiler/optimize.jl
# effects_flags - 9 bits
method_for_inference_limit_heuristics::Any
linetable::Any
slotnames::Vector{Symbol} # names of parameters and local vars used in the code
slotflags::Vector{UInt8} # vinfo flags from flisp
slottypes::Any # nothing (used by typeinf)
rettype::Any # Any (used by typeinf)
parent::Any # nothing (used by typeinf)
edges::Any
min_world::UInt64
max_world::UInt64
inferred::Bool
propagate_inbounds::Bool
has_fcall::Bool
nospecializeinfer::Bool
inlining::UInt8
constprop::UInt8
purity::UInt16
inlining_cost::UInt16
end
In the current Julia runtime,
Base.eval()
jl_toplevel_eval_in
which calls jl_toplevel_eval_flex
jl_toplevel_eval_flex(mod, ex)
:.
:module
:using
:import
:public
:export
:global
:const
:toplevel
:error
:incomplete
Expr(:thunk)
jl_interpret_toplevel_thunk
Should we lower the above blessed top level forms to julia runtime calls? Pros:
Expr(:using)
rather than doing this in the runtime support functions.Expr
In general, we'd be replacing current declarative lowering targets like
Expr(:using)
with an imperative call to a Core
API instead. The call and
the setup of its arguments would need to go in a thunk. We've currently got an
odd mixture of imperative and declarative lowered code.
People look at Racket as an example of a very complete system of hygienic macros. We should learn from them, but keeping in mind that Racket's macro system is inherently more complicated. Racket's current approach to hygiene is described in an accessible talk and in more depth in a paper.
Some differences which makes Racket's macro expander different from Julia: