gelisam / klister

an implementation of stuck macros
BSD 3-Clause "New" or "Revised" License
129 stars 11 forks source link

hygiene for helper functions #120

Open gelisam opened 4 years ago

gelisam commented 4 years ago

When I define a public-facing macro in terms of an auxiliary helper, I often prefer to define my helper as a local function, this way they can take arguments of type other than Syntax, and they can access the variables which are lexically-bound inside the public-facing macros, I don't have to encode them inside the auxiliary macro's Syntax argument. Unfortunately, this means that all the recursive iterations of that helper function are within the same macro invocation, and therefore that binders introduced at different iterations may shadow each other. It is possible to bypass the problem using make-introducer to generate a unique identifier, gensym-style. But how about defining special hygiene-aware functions which go through the dance of adding and flipping a Scope in order to get the same automatic hygiene we already get from macros?

To recap, before a macro is invoked, the expander adds a fresh Scope to its input Syntax, runs the macro, and then flips that Scope on the resulting output Syntax. The result is that all the identifiers which were passed by the caller don't have that Scope anymore (it was added and then flipped off), while the identifiers which were introduced by the macro do (they did not have that Scope and then it was flipped on). We could thus define a syntax-aware function as one which emulates that behaviour by adding a Scope to its input Syntax, runs its body, and then flips that Scope on the output Syntax.

gelisam commented 4 years ago

It should be possible to implement this inside Klister itself, using make-introducer's add and flip. It is not necessary to traverse the Syntax and apply add and flip to every identifier, add and flip are already recursive.

gelisam commented 4 years ago

As discussed, it is not immediately obvious whether recursive syntax-aware functions could break this strategy. With an auxiliary macro, the output Syntax is flipped and then the auxiliary macro is invoked, whereas with a recursive syntax-aware function, recursive calls would add another Scope before the previous Scope is flipped, and then both Scopes would be flipped on the way out.

I think the resulting Syntax will be the same, because the flips are independent; but perhaps this could cause problems with free-identifier=? calls inside those recursive calls?

sorawee commented 4 years ago

Sorry for an unsolicited comment. As I understand, you are saying that define-for-syntax in Racket should be hygienic like define-syntax. Is that correct? If so, I'm not sure why this is preferable.

In Racket macros, I sometimes use helper functions to generate a complex syntax object in one expansion step, in a way that identifiers in the syntax object will be free-identifier=? in the fully expanded code, across various calls of the helper functions. With this proposal, these identifiers would not be free-identifier=? to each other, breaking the macro.

And for the case where you truly want hygiene, you can simply change helper functions to helper macros.

So I think the current behavior in Racket offers a complete flexibility to users already.

david-christiansen commented 4 years ago

I see the existence of the repository as a solicitation to comments and other helpful contributions from knowledegable people :-)

Based on a discussion with @gelisam, my understanding is that the bigger problem he's wanting to solve is to be able to represent intermediate states between macros with explicit datatypes rather than having to encode everything in a particular pattern of syntax objects as we Racketeers are used to doing. In other words, one could represent the abstract syntax of a DSL as a datatype, and then write intermediate transformation steps as going from one datatype to another, getting type assistance from the compiler (Klister has a roughly Hindley-Milner type system). But this should be able to happen while leaving the hygiene facilities intact. In Klister, macros have type (-> Syntax (Macro Syntax)), where Macro is a monad that has the appropriate effects like free-identifier=?. This means that recursive macros have to re-encode all their arguments as Syntax, rather than leaving them in a decoded state.

gelisam commented 4 years ago

As I understand, you are saying that define-for-syntax in Racket should be hygienic like define-syntax. Is that correct?

I don't necessarily want all helper functions to obey hygiene, but I'd like the ability to define a hygienic helper function if I know I do want hygiene.

Let's look at a concrete example: append-until, which calls an append-like function on the truthy prefix of its arguments, or defaults to some else clause if all its arguments are #f. Here is an example program using that macro:

(define (multiple-of x divisor)
  (zero? (modulo x divisor)))

(define (fizzbuzzbang x)
  (append-until string-append
    (and (multiple-of x 3) "fizz")
    (and (multiple-of x 5) "buzz")
    (and (multiple-of x 7) "bang")
    (and (multiple-of x 11) "boom")
    [else x]))

; '(1 "fizz" "fizzbuzz" "fizzbuzzbang" "fizzbuzzbangboom")
(map fizzbuzzbang (list 1 (* 3) (* 3 5) (* 3 5 7) (* 3 5 7 11)))

And here is what I would like the above example to expand to:

(define (fizzbuzzbang x)
  (let ([x1 (and (multiple-of x 3) "fizz")])
    (if x1
        (let ([x2 (and (multiple-of x 5) "buzz")])
          (if x2
              (let ([x3 (and (multiple-of x 7) "bang")])
                (if x3
                    (let ([x4 (and (multiple-of x 11) "boom")])
                      (if x3
                          (string-append x1 x2 x3 x4)
                          (string-append x1 x2 x3)))
                    (string-append x1 x2)))
              (string-append x1)))
        x)))

I picked this example carefully:

  1. the innermost (string-append x1 x2 x3 x4) is referring to variables which are clearly introduced by different calls to the same recursive function-or-macro. Hygiene is thus quite important, otherwise the inner lets will shadow the outer lets and the string-append will append the wrong values.
  2. the outermost if has a condition which comes from the beginning of the input, while its else branch comes from the [else x] clause at the end of the input.

Now, since Racket's syntax-case is so amazingly expressive, it is possible to use the pattern (_ f e1 e2 ... [else else-case]) to extract both the condition and the else clause in a single line. But Klister's syntax-case doesn't yet support such fancy patterns, so let's stick to examining the clauses from left to right. I would thus like to generate the outermost let and if first, and then recur and generate the second-outermost let and if, and eventually hit the [else x] base-case and generate the innermost (string-append x1 x2 x3 x4) and somehow insert the x back into the outermost else branch. How? The solution I chose is to use a continuation. When calling the recursive function go, one must pass a continuation argument indicating what to do with the result and the else clause. When go hits the base case, it thus calls its continuation with (string-append x1 x2 x3 x4) and x. In the induction-step case case, go makes a recursive call, and must thus provide a continuation argument of its own. That continuation argument receives the recursive result and the else clause, and is thus able to construct a bigger result which may involve the else clause.

(define-syntax (append-until stx)
  (syntax-case stx (else)
    [(_ _ [else else-case])
     #'else-case]
    [(_ f e1 case2 ...)
     (letrec ([go
               (lambda (vars-up-to-1 cases2+ cc)
                 (syntax-case cases2+ (else)
                   [([else else-case])
                    (cc #`(f #,@vars-up-to-1)
                        #'else-case)]
                   [(e2 case3 ...)
                    (go #`(#,@vars-up-to-1 x2)
                        #'(case3 ...)
                        (lambda (body else-case)
                          (cc #`(let ([x2 e2])
                                  (if x2
                                      #,body
                                      (f #,@vars-up-to-1)))
                              else-case)))]))])
       (go #'(x1)
           #'(case2 ...)
           (lambda (body else-case)
             #`(let ([x1 e1])
                 (if x1
                     #,body
                     #,else-case)))))]))

Each induction-step introduces a new x2 variable. Unfortunately, all those steps occur within the same macro invocation, and thus all those variables shadow each other, resulting in the following incorrect code:

(let ([x1 (and (multiple-of x 3) "fizz")])
  (if x1
      (let ([x2 (and (multiple-of x 5) "buzz")])
        (if x2
            (let ([x2 (and (multiple-of x 7) "bang")])
              (if x2
                  (let ([x2 (and (multiple-of x 11) "boom")])
                    (if x2
                        (string-append x1 x2 x2 x2)
                        (string-append x1 x2 x2)))
                  (string-append x1 x2)))
            (string-append x1)))
      x))

The solution is to generate a unique identifier using (gensym):

                   [(e2 case3 ...)
                    (let ([x2 (gensym)])
                      (go #`(#,@vars-up-to-1 #,x2)
                          #'(case3 ...)
                          (lambda (body else-case)
                            (cc #`(let ([#,x2 e2])
                                    (if #,x2
                                        #,body
                                        (f #,@vars-up-to-1)))
                                else-case))))]))])

But I would prefer if, instead of doing that manually, I could mark go in some way, perhaps by defining it using hygienic-lambda instead of lambda, and that would cause go to automatically do the right thing somehow:

     (letrec ([go
               (hygienic-lambda (vars-up-to-1 cases2+ cc)
                 (syntax-case cases2+ (else)
                   [([else else-case])
                    (cc #`(f #,@vars-up-to-1)
                        #'else-case)]
                   [(e2 case3 ...)
                    (go #`(#,@vars-up-to-1 x2)
                        #'(case3 ...)
                        (lambda (body else-case)
                          (cc #`(let ([x2 e2])
                                  (if x2
                                      #,body
                                      (f #,@vars-up-to-1)))
                              else-case)))]))])

The above is also a good demonstration of why I don't want every helper functions to introduce a new scope: here only go is using hygienic-lambda, the continuation is still using a plain old lambda. That's because I want both the x2 inside and outside that lambda to be the same identifier, but I don't want them to be the same as the x2's introduced by the recursive go call or any x2 introduced outside of the current go call.

gelisam commented 4 years ago

for the case where you truly want hygiene, you can simply change helper functions to helper macros.

In Klister, syntax objects cannot contain functions, so go really must be a function, it cannot be a macro. In Racket, I can embed a function inside a syntax object and thus I can turn go into an auxiliary macro append-until-aux, but it's messy:

(define-syntax (append-until stx)
  (syntax-case stx (else)
    [(_ _ [else else-case])
     #'else-case]
    [(_ f e1 case2 ...)
     (let* ([cc
             (lambda (body else-case)
               #`(let ([x1 e1])
                   (if x1
                       #,body
                       #,else-case)))]
            [syntax-cc (datum->syntax stx cc)])
       #`(append-until-aux f (x1) (case2 ...) #,syntax-cc))]))

(define-syntax (append-until-aux stx)
  (syntax-case stx (else)
    [(_ f (x1 ...) ([else else-case]) syntax-cc)
     (let ([cc (syntax->datum #'syntax-cc)])
       (cc #'(f x1 ...)
           #'else-case))]
    [(_ f (x1 ...) (e2 case3 ...) syntax-old-cc)
     (let* ([old-cc (syntax->datum #'syntax-old-cc)]
            [new-cc
             (lambda (body else-case)
               (old-cc #`(let ([x2 e2])
                           (if x2
                               #,body
                               (f x1 ...)))
                       else-case))]
            [syntax-new-cc (datum->syntax stx new-cc)])
       #`(append-until-aux f
                           (x1 ... x2)
                           (case3 ...)
                           #,syntax-new-cc))]))

More importantly, it doesn't solve the problem! The generated code this time looks like this:

(let ([x1b (and (multiple-of x 3) "fizz")])
  (if x1b
      (let ([x2b (and (multiple-of x 5) "buzz")])
        (if x2b
            (let ([x2b (and (multiple-of x 7) "bang")])
              (if x2b
                  (let ([x2b (and (multiple-of x 11) "boom")])
                    (if x2b
                        (string-append x1a x2b x2b x2b)
                        (string-append x1a x2b x2b)))
                  (string-append x1a x2b)))
            (string-append x1a)))
      x))

That is, the x1 in the syntax object which append-util returns in (append-until-aux f (x1) (case2 ...) #,syntax-cc) and which gets inserted in the (string-append x1a ...etc...) calls is different from the x1 in the continuation it passes to append-util-aux and which gets inserted in the outermost let and if. Presumably, that's because Racket traverses that (append-until-aux f (x1) (case2 ...) #,syntax-cc) output syntax in order to flip the scope of all the identifiers it contains, flips x1, but doesn't flip anything inside the syntax-cc because a function is an opaque value which cannot be traversed.

Also, the x2's shadow each other again. Presumably, that's because they are added all at once in the (cc #'(f x1 ...) #'else-case) base case, not progressively in successive macro invocations.

Strangely, the x2 in the syntax object which append-util-aux returns in (append-until-aux f (x1 ... x2) (case3 ...) #,syntax-new-cc) and which gets inserted in the (string-append x1a x2b ...etc...) calls is still bound by the x2 in the continuation it passes to the recursive call and which gets inserted in the inner let's and if's. I think they do have a different number of scopes on them, but that in one case the binder has more scopes than the use site while in the other it's the use site which has more scopes.

gelisam commented 4 years ago

The fact that this didn't work helps to answer the initial question about whether hygienic-lambda could be implemented by adding a Scope to the input Syntax, running the function, and then flipping that Scope on the output Syntax: no, it wouldn't work in this example, and for the same reason. When applying the Scope to all the Syntax objects in go's input arguments, we would be unable to apply it to the Syntax literals inside the function's definition, because it's opaque, and when flipping the Scope on the Syntax object in its output, we'd be flipping it on all the x2's because they are introduced all at once in the base case.

gelisam commented 4 years ago

Here's an implementation of append-until which works by flipping scopes instead by calling (gensym). I am adding the scope to the syntax objects which the recursive function is receiving as input, but then instead of flipping the scope on the output of the function, I am flipping it on the syntax objects which are passed to the recursive call and to cc. That makes sense: that's the boundary at which we leave the lexical scope of the recursive function and enter that of the callee's. The (lambda (body_ else-case_) ...etc...) continuation is within the lexical scope of the recursive function, but its input comes from outside, so they are also flipped.

(define-syntax (append-until stx)
  (syntax-case stx (else)
    [(_ _ [else else-case])
     #'else-case]
    [(_ f e1 case2 ...)
     (letrec ([go
               (lambda (vars-up-to-1_ cases2+_ cc)
                 (let* ([flip-scope (make-syntax-introducer)]
                        [vars-up-to-1 (flip-scope vars-up-to-1_)]
                        [cases2+ (flip-scope cases2+_)])
                   (syntax-case cases2+ (else)
                     [([else else-case])
                      (cc (flip-scope #`(f #,@vars-up-to-1))
                          (flip-scope #'else-case))]
                     [(e2 case3 ...)
                      (go (flip-scope #`(#,@vars-up-to-1 x2))
                          (flip-scope #'(case3 ...))
                          (lambda (body_ else-case_)
                            (let ([body (flip-scope body_)]
                                  [else-case (flip-scope else-case_)])
                              (cc (flip-scope
                                    #`(let ([x2 e2])
                                        (if x2
                                            #,body
                                            (f #,@vars-up-to-1))))
                                  (flip-scope
                                    else-case)))))])))])
       (go #'(x1)
           #'(case2 ...)
           (lambda (body else-case)
             #`(let ([x1 e1])
                 (if x1
                     #,body
                     #,else-case)))))]))

The next step is to write a hygienic-lambda macro which somehow generates that implementation.

gelisam commented 4 years ago

I think hygienic-lambda might be a good motivating example for type-aware macros and for #119 ! Here's how I think hygienic-lambda could be implemented if we had local-expand and #119.

First, local-expand the body's Syntax into a Core AST. Then, replace every (lambda (x) e) whose x is a Syntax with a (lambda (x_) (let [x (flip-scope x_)] e)), every (lambda (x) e) whose e is a Syntax with (lambda (x) (flip-scope e)), every function call (f e) whose e is a Syntax with (f (flip-scope e)), and every (f e) whose output is a Syntax with (flip-scope (f e)). Optionally optimize-away the many self-cancelling (flip-scope (flip-scope e))'s this will generate. Note that (quote x) is not a function call, it's a Syntax literal.

For example,

(hygienic-lambda (x)
  (pure (pair-list-syntax 'f x x))

would become

(lambda (x_)
  (do (i <- (make-introducer))
      (let [x (flip-scope x_)]
        (pure
          (flip-scope
            (flip-scope
              (pair-list-syntax
                (flip-scope 'f)
                (flip-scope x)
                (flip-scope x))))))))

with the net effect that in the resulting (f ,x), f's scope was flipped 3 times, an odd number and so this f is unique to this invocation of the hygienic function, while x's scope was flipped 4 times, an even number so the variables in this x can refer to variables bound outside of this invocation.

gelisam commented 4 years ago

I don't think I've explained the relationship with #119 very well. The idea is that if macros are manipulating raw syntax, then it is difficult to interpret that syntax correctly. For example, we could write a variant of hygienic-lambda which traverses the body and transforms pieces of syntax of the form (lambda (x ...) e), but if e.g. the user has defined a lambda-case macro, our variant wouldn't know that it should transform that piece too. By having a clearly-specified AST (which happens to be Core, but could easily have been an extended variant of Core with some special annotations around the identifiers which should or shouldn't be treated hygienically) with clear semantics, macros like syntax-case which produce those AST values can collaborate with macros like hygienic-lambda which consume those AST values.

Importantly, those consumers can be confident that they have exhaustively covered all the relevant cases, simply by writing a total function on that AST type.