Open gelisam opened 4 years ago
It should be possible to implement this inside Klister itself, using make-introducer
's add
and flip
. It is not necessary to traverse the Syntax
and apply add
and flip
to every identifier, add
and flip
are already recursive.
As discussed, it is not immediately obvious whether recursive syntax-aware functions could break this strategy. With an auxiliary macro, the output Syntax is flipped and then the auxiliary macro is invoked, whereas with a recursive syntax-aware function, recursive calls would add another Scope before the previous Scope is flipped, and then both Scopes would be flipped on the way out.
I think the resulting Syntax will be the same, because the flips are independent; but perhaps this could cause problems with free-identifier=?
calls inside those recursive calls?
Sorry for an unsolicited comment. As I understand, you are saying that define-for-syntax
in Racket should be hygienic like define-syntax
. Is that correct? If so, I'm not sure why this is preferable.
In Racket macros, I sometimes use helper functions to generate a complex syntax object in one expansion step, in a way that identifiers in the syntax object will be free-identifier=?
in the fully expanded code, across various calls of the helper functions. With this proposal, these identifiers would not be free-identifier=?
to each other, breaking the macro.
And for the case where you truly want hygiene, you can simply change helper functions to helper macros.
So I think the current behavior in Racket offers a complete flexibility to users already.
I see the existence of the repository as a solicitation to comments and other helpful contributions from knowledegable people :-)
Based on a discussion with @gelisam, my understanding is that the bigger problem he's wanting to solve is to be able to represent intermediate states between macros with explicit datatypes rather than having to encode everything in a particular pattern of syntax objects as we Racketeers are used to doing. In other words, one could represent the abstract syntax of a DSL as a datatype, and then write intermediate transformation steps as going from one datatype to another, getting type assistance from the compiler (Klister has a roughly Hindley-Milner type system). But this should be able to happen while leaving the hygiene facilities intact. In Klister, macros have type (-> Syntax (Macro Syntax))
, where Macro
is a monad that has the appropriate effects like free-identifier=?
. This means that recursive macros have to re-encode all their arguments as Syntax
, rather than leaving them in a decoded state.
As I understand, you are saying that
define-for-syntax
in Racket should be hygienic likedefine-syntax
. Is that correct?
I don't necessarily want all helper functions to obey hygiene, but I'd like the ability to define a hygienic helper function if I know I do want hygiene.
Let's look at a concrete example: append-until
, which calls an append-like function on the truthy prefix of its arguments, or defaults to some else
clause if all its arguments are #f
. Here is an example program using that macro:
(define (multiple-of x divisor)
(zero? (modulo x divisor)))
(define (fizzbuzzbang x)
(append-until string-append
(and (multiple-of x 3) "fizz")
(and (multiple-of x 5) "buzz")
(and (multiple-of x 7) "bang")
(and (multiple-of x 11) "boom")
[else x]))
; '(1 "fizz" "fizzbuzz" "fizzbuzzbang" "fizzbuzzbangboom")
(map fizzbuzzbang (list 1 (* 3) (* 3 5) (* 3 5 7) (* 3 5 7 11)))
And here is what I would like the above example to expand to:
(define (fizzbuzzbang x)
(let ([x1 (and (multiple-of x 3) "fizz")])
(if x1
(let ([x2 (and (multiple-of x 5) "buzz")])
(if x2
(let ([x3 (and (multiple-of x 7) "bang")])
(if x3
(let ([x4 (and (multiple-of x 11) "boom")])
(if x3
(string-append x1 x2 x3 x4)
(string-append x1 x2 x3)))
(string-append x1 x2)))
(string-append x1)))
x)))
I picked this example carefully:
(string-append x1 x2 x3 x4)
is referring to variables which are clearly introduced by different calls to the same recursive function-or-macro. Hygiene is thus quite important, otherwise the inner let
s will shadow the outer let
s and the string-append
will append the wrong values.if
has a condition which comes from the beginning of the input, while its else branch comes from the [else x]
clause at the end of the input.Now, since Racket's syntax-case
is so amazingly expressive, it is possible to use the pattern (_ f e1 e2 ... [else else-case])
to extract both the condition and the else clause in a single line. But Klister's syntax-case
doesn't yet support such fancy patterns, so let's stick to examining the clauses from left to right. I would thus like to generate the outermost let
and if
first, and then recur and generate the second-outermost let
and if
, and eventually hit the [else x]
base-case and generate the innermost (string-append x1 x2 x3 x4)
and somehow insert the x
back into the outermost else branch. How? The solution I chose is to use a continuation. When calling the recursive function go
, one must pass a continuation argument indicating what to do with the result and the else clause. When go
hits the base case, it thus calls its continuation with (string-append x1 x2 x3 x4)
and x
. In the induction-step case case, go
makes a recursive call, and must thus provide a continuation argument of its own. That continuation argument receives the recursive result and the else clause, and is thus able to construct a bigger result which may involve the else clause.
(define-syntax (append-until stx)
(syntax-case stx (else)
[(_ _ [else else-case])
#'else-case]
[(_ f e1 case2 ...)
(letrec ([go
(lambda (vars-up-to-1 cases2+ cc)
(syntax-case cases2+ (else)
[([else else-case])
(cc #`(f #,@vars-up-to-1)
#'else-case)]
[(e2 case3 ...)
(go #`(#,@vars-up-to-1 x2)
#'(case3 ...)
(lambda (body else-case)
(cc #`(let ([x2 e2])
(if x2
#,body
(f #,@vars-up-to-1)))
else-case)))]))])
(go #'(x1)
#'(case2 ...)
(lambda (body else-case)
#`(let ([x1 e1])
(if x1
#,body
#,else-case)))))]))
Each induction-step introduces a new x2
variable. Unfortunately, all those steps occur within the same macro invocation, and thus all those variables shadow each other, resulting in the following incorrect code:
(let ([x1 (and (multiple-of x 3) "fizz")])
(if x1
(let ([x2 (and (multiple-of x 5) "buzz")])
(if x2
(let ([x2 (and (multiple-of x 7) "bang")])
(if x2
(let ([x2 (and (multiple-of x 11) "boom")])
(if x2
(string-append x1 x2 x2 x2)
(string-append x1 x2 x2)))
(string-append x1 x2)))
(string-append x1)))
x))
The solution is to generate a unique identifier using (gensym)
:
[(e2 case3 ...)
(let ([x2 (gensym)])
(go #`(#,@vars-up-to-1 #,x2)
#'(case3 ...)
(lambda (body else-case)
(cc #`(let ([#,x2 e2])
(if #,x2
#,body
(f #,@vars-up-to-1)))
else-case))))]))])
But I would prefer if, instead of doing that manually, I could mark go
in some way, perhaps by defining it using hygienic-lambda
instead of lambda
, and that would cause go
to automatically do the right thing somehow:
(letrec ([go
(hygienic-lambda (vars-up-to-1 cases2+ cc)
(syntax-case cases2+ (else)
[([else else-case])
(cc #`(f #,@vars-up-to-1)
#'else-case)]
[(e2 case3 ...)
(go #`(#,@vars-up-to-1 x2)
#'(case3 ...)
(lambda (body else-case)
(cc #`(let ([x2 e2])
(if x2
#,body
(f #,@vars-up-to-1)))
else-case)))]))])
The above is also a good demonstration of why I don't want every helper functions to introduce a new scope: here only go
is using hygienic-lambda
, the continuation is still using a plain old lambda
. That's because I want both the x2
inside and outside that lambda
to be the same identifier, but I don't want them to be the same as the x2
's introduced by the recursive go
call or any x2
introduced outside of the current go
call.
for the case where you truly want hygiene, you can simply change helper functions to helper macros.
In Klister, syntax objects cannot contain functions, so go
really must be a function, it cannot be a macro. In Racket, I can embed a function inside a syntax object and thus I can turn go
into an auxiliary macro append-until-aux
, but it's messy:
(define-syntax (append-until stx)
(syntax-case stx (else)
[(_ _ [else else-case])
#'else-case]
[(_ f e1 case2 ...)
(let* ([cc
(lambda (body else-case)
#`(let ([x1 e1])
(if x1
#,body
#,else-case)))]
[syntax-cc (datum->syntax stx cc)])
#`(append-until-aux f (x1) (case2 ...) #,syntax-cc))]))
(define-syntax (append-until-aux stx)
(syntax-case stx (else)
[(_ f (x1 ...) ([else else-case]) syntax-cc)
(let ([cc (syntax->datum #'syntax-cc)])
(cc #'(f x1 ...)
#'else-case))]
[(_ f (x1 ...) (e2 case3 ...) syntax-old-cc)
(let* ([old-cc (syntax->datum #'syntax-old-cc)]
[new-cc
(lambda (body else-case)
(old-cc #`(let ([x2 e2])
(if x2
#,body
(f x1 ...)))
else-case))]
[syntax-new-cc (datum->syntax stx new-cc)])
#`(append-until-aux f
(x1 ... x2)
(case3 ...)
#,syntax-new-cc))]))
More importantly, it doesn't solve the problem! The generated code this time looks like this:
(let ([x1b (and (multiple-of x 3) "fizz")])
(if x1b
(let ([x2b (and (multiple-of x 5) "buzz")])
(if x2b
(let ([x2b (and (multiple-of x 7) "bang")])
(if x2b
(let ([x2b (and (multiple-of x 11) "boom")])
(if x2b
(string-append x1a x2b x2b x2b)
(string-append x1a x2b x2b)))
(string-append x1a x2b)))
(string-append x1a)))
x))
That is, the x1
in the syntax object which append-util
returns in (append-until-aux f (x1) (case2 ...) #,syntax-cc)
and which gets inserted in the (string-append x1a ...etc...)
calls is different from the x1
in the continuation it passes to append-util-aux
and which gets inserted in the outermost let
and if
. Presumably, that's because Racket traverses that (append-until-aux f (x1) (case2 ...) #,syntax-cc)
output syntax in order to flip the scope of all the identifiers it contains, flips x1
, but doesn't flip anything inside the syntax-cc
because a function is an opaque value which cannot be traversed.
Also, the x2
's shadow each other again. Presumably, that's because they are added all at once in the (cc #'(f x1 ...) #'else-case)
base case, not progressively in successive macro invocations.
Strangely, the x2
in the syntax object which append-util-aux
returns in (append-until-aux f (x1 ... x2) (case3 ...) #,syntax-new-cc)
and which gets inserted in the (string-append x1a x2b ...etc...)
calls is still bound by the x2
in the continuation it passes to the recursive call and which gets inserted in the inner let
's and if
's. I think they do have a different number of scopes on them, but that in one case the binder has more scopes than the use site while in the other it's the use site which has more scopes.
The fact that this didn't work helps to answer the initial question about whether hygienic-lambda
could be implemented by adding a Scope to the input Syntax, running the function, and then flipping that Scope on the output Syntax: no, it wouldn't work in this example, and for the same reason. When applying the Scope to all the Syntax objects in go
's input arguments, we would be unable to apply it to the Syntax literals inside the function's definition, because it's opaque, and when flipping the Scope on the Syntax object in its output, we'd be flipping it on all the x2
's because they are introduced all at once in the base case.
Here's an implementation of append-until
which works by flipping scopes instead by calling (gensym)
. I am adding the scope to the syntax objects which the recursive function is receiving as input, but then instead of flipping the scope on the output of the function, I am flipping it on the syntax objects which are passed to the recursive call and to cc
. That makes sense: that's the boundary at which we leave the lexical scope of the recursive function and enter that of the callee's. The (lambda (body_ else-case_) ...etc...)
continuation is within the lexical scope of the recursive function, but its input comes from outside, so they are also flipped.
(define-syntax (append-until stx)
(syntax-case stx (else)
[(_ _ [else else-case])
#'else-case]
[(_ f e1 case2 ...)
(letrec ([go
(lambda (vars-up-to-1_ cases2+_ cc)
(let* ([flip-scope (make-syntax-introducer)]
[vars-up-to-1 (flip-scope vars-up-to-1_)]
[cases2+ (flip-scope cases2+_)])
(syntax-case cases2+ (else)
[([else else-case])
(cc (flip-scope #`(f #,@vars-up-to-1))
(flip-scope #'else-case))]
[(e2 case3 ...)
(go (flip-scope #`(#,@vars-up-to-1 x2))
(flip-scope #'(case3 ...))
(lambda (body_ else-case_)
(let ([body (flip-scope body_)]
[else-case (flip-scope else-case_)])
(cc (flip-scope
#`(let ([x2 e2])
(if x2
#,body
(f #,@vars-up-to-1))))
(flip-scope
else-case)))))])))])
(go #'(x1)
#'(case2 ...)
(lambda (body else-case)
#`(let ([x1 e1])
(if x1
#,body
#,else-case)))))]))
The next step is to write a hygienic-lambda
macro which somehow generates that implementation.
I think hygienic-lambda
might be a good motivating example for type-aware macros and for #119 ! Here's how I think hygienic-lambda
could be implemented if we had local-expand and #119.
First, local-expand the body's Syntax into a Core AST. Then, replace every (lambda (x) e)
whose x
is a Syntax
with a (lambda (x_) (let [x (flip-scope x_)] e))
, every (lambda (x) e)
whose e
is a Syntax
with (lambda (x) (flip-scope e))
, every function call (f e)
whose e
is a Syntax
with (f (flip-scope e))
, and every (f e)
whose output is a Syntax
with (flip-scope (f e))
. Optionally optimize-away the many self-cancelling (flip-scope (flip-scope e))
's this will generate. Note that (quote x)
is not a function call, it's a Syntax
literal.
For example,
(hygienic-lambda (x)
(pure (pair-list-syntax 'f x x))
would become
(lambda (x_)
(do (i <- (make-introducer))
(let [x (flip-scope x_)]
(pure
(flip-scope
(flip-scope
(pair-list-syntax
(flip-scope 'f)
(flip-scope x)
(flip-scope x))))))))
with the net effect that in the resulting (f ,x)
, f
's scope was flipped 3 times, an odd number and so this f
is unique to this invocation of the hygienic function, while x
's scope was flipped 4 times, an even number so the variables in this x
can refer to variables bound outside of this invocation.
I don't think I've explained the relationship with #119 very well. The idea is that if macros are manipulating raw syntax, then it is difficult to interpret that syntax correctly. For example, we could write a variant of hygienic-lambda
which traverses the body and transforms pieces of syntax of the form (lambda (x ...) e)
, but if e.g. the user has defined a lambda-case
macro, our variant wouldn't know that it should transform that piece too. By having a clearly-specified AST (which happens to be Core, but could easily have been an extended variant of Core with some special annotations around the identifiers which should or shouldn't be treated hygienically) with clear semantics, macros like syntax-case
which produce those AST values can collaborate with macros like hygienic-lambda
which consume those AST values.
Importantly, those consumers can be confident that they have exhaustively covered all the relevant cases, simply by writing a total function on that AST type.
When I define a public-facing macro in terms of an auxiliary helper, I often prefer to define my helper as a local function, this way they can take arguments of type other than
Syntax
, and they can access the variables which are lexically-bound inside the public-facing macros, I don't have to encode them inside the auxiliary macro'sSyntax
argument. Unfortunately, this means that all the recursive iterations of that helper function are within the same macro invocation, and therefore that binders introduced at different iterations may shadow each other. It is possible to bypass the problem usingmake-introducer
to generate a unique identifier,gensym
-style. But how about defining special hygiene-aware functions which go through the dance of adding and flipping a Scope in order to get the same automatic hygiene we already get from macros?To recap, before a macro is invoked, the expander adds a fresh Scope to its input Syntax, runs the macro, and then flips that Scope on the resulting output Syntax. The result is that all the identifiers which were passed by the caller don't have that Scope anymore (it was added and then flipped off), while the identifiers which were introduced by the macro do (they did not have that Scope and then it was flipped on). We could thus define a syntax-aware function as one which emulates that behaviour by adding a Scope to its input Syntax, runs its body, and then flips that Scope on the output Syntax.