drym-org / qi

An embeddable flow-oriented language.
59 stars 12 forks source link

First-class macro extensibility #16

Closed countvajhula closed 2 years ago

countvajhula commented 2 years ago

Adds first class macros using binding spaces, so that Qi macros and macros for any other hosted or host language can coexist and not interfere (see #14 for full context).

Release Plan

Notes

This feature will only be available in recent (>= 8.3) versions of Racket.

Older versions of Racket will be moved to a distinct development branch prior to this being merged in, and will continue to be supported and receive critical bug fixes, but probably will not receive new features.

Public Domain Dedication

countvajhula commented 2 years ago

@michaelballantyne This PR is just about ready. Mind taking a look whenever you have a moment? I'm mainly looking for confirmation that:

But any other feedback is welcome, too, of course πŸ˜„

michaelballantyne commented 2 years ago

"first class macro extensibility"

Sure, that's less likely to be misleading I think.

On Wed, Feb 9, 2022 at 9:47 PM Siddhartha Kasivajhula < @.***> wrote:

@.**** commented on this pull request.

In qi-doc/scribblings/macros.scrbl https://github.com/countvajhula/qi/pull/16#discussion_r803257862:

+@(define eval-for-docs

  • (parameterize ([sandbox-output 'string]
  • [sandbox-error-output 'string]
  • [sandbox-memory-limit #f])
  • (make-evaluator 'racket/base
  • '(require qi
  • (only-in racket/list range first rest)
  • (for-syntax syntax/parse racket/base)
  • racket/string
  • relation)
  • '(define (sqr x)
  • (* x x)))))
  • @.***{Qi Macros}

  • +Qi may be extended in much the same way as Racket -- using @tech/reference{macros}. Qi macros are "first class," meaning that they are indistinguishable from built-in Qi forms during the macro expansion phase, just as user-defined Racket macros as indistinguishable from macros that are part of the Racket language. This allows us to have the same syntactic freedom with Qi as we are used to with Racket.

True, I did think that it might be interpreted in this sense but I wasn't sure if the term was actually used in this way. What do you think about using the term "first class macro extensibility" but not "first class macros"?

And nice, TIL about FEXPRs and the possibility of simultaneous expansion + evaluation!

β€” Reply to this email directly, view it on GitHub https://github.com/countvajhula/qi/pull/16#discussion_r803257862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46UZUZCRZJECMTIKQMDLU2MRMJANCNFSM5LK4C4EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 2 years ago

Btw, while trying to get a sense of DSLs in the Racket community I ended up playing with deta, and realized that it uses the threading macro but cannot (naively) be used with Qi. I kind of assumed they were completely interchangeable, but I realized that's not quite the case since the threading macro is a purely syntactic transformation whereas Qi additionally expects the components to be function-valued. This also means that Sawzall -- a data science DSL that was introduced in the same RacketCon session as Qi -- isn't usable with Qi directly as I'd assumed. I've been writing a lot of docs today explaining how Qi can be used in combination with other such DSLs which are purely syntactic forms rather than functions, and cases where it could make sense for such macro-embedded DSLs to be written as Qi dialects. Would love your thoughts on these as well, will be committing them soon.

countvajhula commented 2 years ago

@michaelballantyne Ok I've committed the new docs re: interoperating with other DSLs, and also addressed your earlier comments -- see what you think. It may be easiest to git fetch this branch locally and then make build-docs && make docs to avoid reading the Git diff in detail.

michaelballantyne commented 2 years ago

I think you could provide a facility that would make interoperation with macros designed for the threading syntax easier. Essentially:

(define-syntax-rule
  (define-qi-threadable-syntaxes name ...)
  (begin
    (define-qi-syntax-rule (name arg ...)
      (esc (lambda (v) (name v arg ...))))))
    ...))

But a better implementation would share the transformer procedure to avoid generating a ton of code:

(begin-for-syntax
  (define qi-threadable-syntax-transformer
    (qi-macro
     (syntax-parser stx
      [(name form ...)
       #'(esc (lambda (v) (name v form ...)))]))))

(define-syntax define-qi-threadable-syntaxes
  (syntax-parser
    [(_ form-name ...)
     #:with (spaced-form-name ...) (map (make-interned-syntax-introducer 'qi) (attribute form-name))
     #'(begin
         (define-syntax spaced-form-name qi-threadable-syntax-transformer)
         ...)]))

Then implementing the bridge for deta would be pretty concise:

(define-qi-threadable-syntaxes
  group-by join limit offset order-by project-onto project-virtual-fields returning
  select select-for-schema union update where or-where delete)

However, I'm also curious to see what examples of mixing these DSL flows with Qi look like, and what alternative designs you've considered. Some other possibilities that come to mind:

countvajhula commented 2 years ago

I really like those ideas. I want to make sure I understand your suggestion here:

Detect these with syntax-local-value and procedure?

(Let's call this Option Q for reference)

Does this mean we can do something like

#:when (procedure? (syntax-local-value #'form))

... to detect -- within the flow macro -- whether the form to be expanded is a macro rather than a function? That would be very convenient indeed.

One consideration here is that the behavior of individual forms under threading is a function both of the form itself (are there any manually indicated (via _) argument positions?) as well as the threading form being employed (i.e. left or right threading, ~> or ~>>).

In the light of this, it seems like define-qi-threading-syntaxes (or for that matter, users writing out the macro bridge manually) would have to at least encode an assumption about the threading direction here -- for instance that it will always be left-threading ~>. On the other hand, if I understand Option Q, then since we can detect that a form is a macro while within the expansion context of the flow macro, and since we also know the threading direction within that context (it's set as the syntax property threading-side), then we should be able to handle both left and right threading here. Furthermore, there are also cases like this one:

(~> ("Jack" "Jill") (my-string-concat-macro "hi " _ "and " _))

where, while we cannot know the number of runtime values at compile time, in this case, the user has indicated that they expect there to be two of them. It sounds like we should be able to handle this case, too, with Option Q (or I guess even with define-qi-threading-syntaxes), by writing this as a two-argument lambda, and this would also exhibit identical behavior to partial application of functions (currently done via delegation to fancy-app). Finally, we can also handle __:

(~> ("Jack" "Jill") (my-string-concat-macro "hello " __ "!"))

as well as this:

(~> ("Jack" "Jill") (my-string-concat-macro ", hello!"))

by writing them as a variadic lambda (lambda args ...), and using something like ,@args.

... so that Qi's threading form would now handle macros too, and in fact, support multiple arguments the same way as for functions! I guess not being able to know at compile-time what the number of arguments provided to the macros will be, isn't really a problem after all, as I'd thought initially.

Does that sound right to you? If so, I do believe we have a winner πŸ˜„

countvajhula commented 2 years ago

Ok I think the threading form can't actually handle implicit multiple arguments but can handle explicit multiple arguments.

Implicit cases would need to assume a single argument:

(~> (mac arg ...))
(~>> (mac arg ...))

Explicit cases place the arguments at the indicated positions:

(~> (mac a _ b _ c))

Left- and right-threading are equivalent in the explicit cases (just like the usual threading macro -- although hypothetically if there were 5 arguments and only 2 were explicitly indicated, arguably ~> and ~>> could place the remainder at the beginning or end, respectively -- this isn't how it currently works, though).

The following case would not be possible to handle:

(~> (mac a __ b))

... since ,@args which I suggested earlier would not make sense in the lambda body as it isn't expanding when the lambda is called -- maybe a case for "simultaneous expansion + evaluation" πŸ˜„

So there are only 2 cases to handle -- (1) the implicit / single argument case for left and right threading, and (2) the explicit case for multiple arguments.

I've added (1) already. I don't have a clear idea of how to implement (2) yet. Seems like we'd need to identify the index positions of _ in the input syntax as well as the number of them. And then we'd need to construct the output syntax so that the lambda accepts as many arguments as the number of _'s, and then places those arguments at the right index positions in the body of the lambda. If I get too tangled I might just create this as a followup feature.

michaelballantyne commented 2 years ago

All sounds right to me.

Seems like we'd need to identify the index positions of _ in the input syntax as well as the number of them.

I presume the threading macro has to do this already; you could look to see how they do it? It might also be worth checking to see what happens if someone tries to nest threading like this:

(~> x
      (m _ (~> y
                    (m _ 5))))

With the threading macro I hope that would raise a syntax error as it only supports a single value. In Qi it could behave surprisingly as you might treat it as a flow that expects two arguments. Perhaps the outer threading instance could tag the syntax it substitutes with a property, and the inner threading instance could raise a syntax error indicating the ambiguity.

On Fri, Feb 11, 2022 at 1:26 PM Siddhartha Kasivajhula < @.***> wrote:

Ok I think the threading form can't actually handle implicit multiple arguments but can handle explicit multiple arguments.

Implicit cases would need to assume a single argument:

(~> (mac arg ...))

(~>> (mac arg ...))

Explicit cases place the arguments at the indicated positions:

(~> (mac a b c))

Left- and right-threading are equivalent in the explicit cases (just like the usual threading macro -- although hypothetically if there were 5 arguments and only 2 were explicitly indicated, arguably ~> and ~>> could place the remainder at the beginning or end, respectively -- this isn't how it currently works, though).

The following case would not be possible to handle:

(~> (mac a __ b))

... since @.*** which I suggested earlier would not make sense in the lambda body as it isn't expanding when the lambda is called -- maybe a case for "simultaneous expansion + evaluation" πŸ˜„

So there are only 2 cases to handle -- (1) the implicit / single argument case for left and right threading, and (2) the explicit case for multiple arguments.

I've added (1) already. I don't have a clear idea of how to implement (2) yet. Seems like we'd need to identify the index positions of in the input syntax as well as the number of them. And then we'd need to construct the output syntax so that the lambda accepts as many arguments as the number of 's, and then places those arguments at the right index positions in the body of the lambda. If I get too tangled I might just create this as a followup feature.

β€” Reply to this email directly, view it on GitHub https://github.com/countvajhula/qi/pull/16#issuecomment-1036496224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46U44Y7DS3E3KZLUQQ2DU2VIFXANCNFSM5LK4C4EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 2 years ago

Both the usual threading macro as well as Qi's version only unwrap one level of the component forms, so they both succeed in that example:

(require threading)
(~> "a"
    (string-append _ (~> "b"
                         (string-append _ "c")))) ;=> "abc"
(require qi)
(~> ("a")
    (string-append _ (~> ("b")
                         (string-append _ "c")))) ;=> "abc"
(~> ("a" "d")
    (string-append _ (~> ("b")
                         (string-append _ "c")) _)) ;=> "abcd"

Btw this is somewhat related: #5

michaelballantyne commented 2 years ago

Ah, so this works:

(~> #t (if _ 5 6))

but this doesn't:

(~> #t (cond [_ 5] [else 6]))

as it gets turned into

(cond #t [_ 5] [else 6]))

I guess that's okay but it's a little dissatisfying.

On Fri, Feb 11, 2022 at 2:33 PM Siddhartha Kasivajhula < @.***> wrote:

Both the usual threading macro as well as Qi's version only unwrap one level of the component forms, so they both succeed in that example:

(require threading) (~> "a" (string-append (~> "b" (string-append "c")))) ;=> "abc"

(require qi) (~> ("a") (string-append (~> ("b") (string-append "c")))) ;=> "abc" (~> ("a" "d") (string-append (~> ("b") (string-append "c")) _)) ;=> "abcd"

β€” Reply to this email directly, view it on GitHub https://github.com/countvajhula/qi/pull/16#issuecomment-1036548256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46UYLYIFQ46C7BU5BFETU2VQBDANCNFSM5LK4C4EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 2 years ago

That's a good example of a case where you would expect the argument to be passed at an inner nesting level rather than the top level of the component form. It may be that in such cases it would always be a specific nesting level that would make sense (and not any nesting level). Not sure if there would be a way to know in general when this case applies. It would only affect foreign macros though, for Qi (e.g. switch already does what we expect here).

countvajhula commented 2 years ago

So here's an odd thing, the simple threading case that I added support for was failing initially. Turns out it's because, for some reason, apply was returning true for the macro check: #:when (procedure? (syntax-local-value #'mac (Ξ» () #f))), causing any forms using apply to fail. I initially thought it might have something to do with the fact that I'm using fancy-app in the flow module, but this happens even without it.

I added this to the macro check (not (free-identifier=? #'apply #'mac)) to get around it and that fixes most cases. But looks like sort is another case that returns true for the macro check, so I'd need to exclude this in the same manner as apply in order to get all of the existing tests to pass. Any idea what the pattern here is? Would be great to catch that whole pattern and not be surprised by another one of these cases in the wild...

michaelballantyne commented 2 years ago

Ahh... here be dragons.

Support for procedures with keyword arguments is implemented by macros, not the core language. Functions that take keyword arguments are actually defined as syntax, and #%app and apply in racket/base cooperate with the associated compile-time information to optimize direct calls. See racket/private/kw

In general, anything that looks like a procedure might actually be a macro that has an identifier-macro case for when the name is used in a first-class way.

What breaks when you do the syntactic transformation for these cases?

On Fri, Feb 11, 2022 at 5:10 PM Siddhartha Kasivajhula < @.***> wrote:

So here's an odd thing, the simple threading case that I added support for was failing initially. Turns out it's because, for some reason, apply was returning true for the macro check: #:when (procedure? (syntax-local-value #'mac (Ξ» () #f))), causing any forms using apply to fail. I initially thought it might have something to do with the fact that I'm using fancy-app in the flow module, but this happens even without it.

I added this to the macro check (not (free-identifier=? #'apply #'mac)) to get around it and that fixes most cases. But looks like sort is another case that returns true for the macro check, so I'd need to exclude this in the same manner as apply in order to get all of the existing tests to pass. Any idea what the pattern here is? Would be great to catch that whole pattern and not be surprised by another one of these cases in the wild...

β€” Reply to this email directly, view it on GitHub https://github.com/countvajhula/qi/pull/16#issuecomment-1036674035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46U4B7YEU6OPSGEUVSTDU2WCMVANCNFSM5LK4C4EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 2 years ago

That makes total sense. It also explains why the free-identifier=? check was working for apply but for sort I had to use (eq? 'sort (syntax->datum #'mac)), since the sort being used here was a private function defined in a test rather than "the" sort. This private function was also being seen as a macro though because it also happened to accept keyword arguments.

Re: what breaks, here's one example with apply:

((☯ (apply > _)) (list 1 2 3)) ;=> #<procedure:.../fancy-app/main.rkt:28:19>

Which is because it expands to:

(apply (list 1 2 3) > _)

which fancy-app treats as a template application.

For sort, there is this private test function:

(define (sort less-than? #:key key . vs)
  (b:sort (map key vs) less-than?))

If I patch the apply case, I still get this lone test failure:

--------------------
result of predicate expression > switch > conditionals > flow tests > qi tests > Unnamed test
ERROR
name:       check-equal?
location:   flow.rkt:762:6

result arity mismatch;
 expected number of values not received
  expected: 1
  received: 3
--------------------

I was able to verify that it did think sort was a macro in this case, but the funny thing is that this test only fails when I run it via raco test -- if I manually run the test in the REPL it works correctly. So it is a bit of a heisenbug... I'll continue digging.

countvajhula commented 2 years ago

The error message in the test makes me think that the test for some reason is running the flow with the built-in sort which expects a single argument and produces a similar error when called with more:

> ((☯ (~>> (b:sort < #:key identity))) 1 2 3)

; .../qi/qi-lib/flow.rkt:488:20: arity mismatch;
;  the expected number of arguments does not match the given number
;   expected: 1
;   given: 3

I recently noticed something that may be at play here: if you have a macro that employs some identifier in the template that is in the lexical scope of the macro definition, then if you use that macro in a REPL where you have a different definition for that same identifier, it will use the version at the definition site. This seems straightforward. But if you have a macro that employs an identifier that is undefined in the macro definition site lexical scope, then it will at first complain when you try to use the macro at use-site (e.g. REPL). But if you then define that identifier in the dynamic scope of the REPL, it now works(!). I guess it falls back to the dynamic scope in some cases?

I'm able to reproduce the test failure with this simplified test (only when run via raco):

(check-equal? ((☯ (~>> (sort < #:key identity)))
                  2 1 3)
                 (list 1 2 3))

I guess if there is an identifier available in both the lexical scope of the macro definition as well as the lexical scope of the macro use site, then it would favor the former? That could explain why the test fails initially as it's using the built-in sort which expects one list argument. And then I suppose it passes when I run it manually in the REPL because then it is using the dynamic scope of the REPL in which sort refers to the private function. But in this case, sort is defined at macro definition site, so not sure why it would now favor the dynamic scope where formerly it did not in this case. Also not really sure why the test was ever passing then, if it was always using the built-in sort from definition-site 🀷

michaelballantyne commented 2 years ago

But if you then define that identifier in the dynamic scope of the REPL, it now works(!). I guess it falls back to the dynamic scope in some cases?

The top-level / REPL is weird because it wants to support mutual recursion between separately-evaluated definitions, along with redefinition of functions and other value bindings. So when a reference in the top level that appears to be unbound is compiled, it is instead compiled as a reference to a top-level binding that can be filled in later.

The REPL in DrRacket is in a namespace created via by module->namespace from the module in the definitions pane so that you can interact with the definitions within the module whether or not they are exported. The result is that the REPL works in effectively the same lexical context as the module. So the macro is expanding to a reference in the definition-site... it's just that you've extended the bindings available in the definition site by adding a definition in the REPL.

Still not sure what's going on with your test that works differently in the module and the REPL though.

michaelballantyne commented 2 years ago

Re: what breaks, here's one example with apply:

Ah, so it seems like things will work fine if you implement support for _ for macros, right?

Indeed, it seems like the only case you need the function rather than syntax behavior is for __. Perhaps when you see a __, assume the form must be a function application rather than syntax even if it appears to be syntax based on syntax-local-value?

countvajhula commented 2 years ago

Ah, so it seems like things will work fine if you implement support for _ for macros, right?

That's a good point.

Indeed, it seems like the only case you need the function rather than syntax behavior is for . Perhaps when you see a , assume the form must be a function application rather than syntax even if it appears to be syntax based on syntax-local-value?

True, I could do that by just prioritizing the __ rule above the macro rule.

If these pan out, then it's just the sort issue that remains to be sorted out. Maybe I'll get this other stuff working first and then return to sort.

countvajhula commented 2 years ago

@michaelballantyne thanks for all of your help so far!

I'm thinking it might make things simpler if we could somehow recognize functions that accept keyword arguments, to exclude them from being treated as macros. Do you happen to know of a way to distinguish these? More detailed explanations follow:

There are two broad issues at the moment: (A) a hygiene issue (B) the sort issue

(A) As we last talked about, I added the functionality to render an "application template" into a lambda taking an appropriate number of arguments and then placing those arguments at the right spots in the body of the lambda, in order to support something like (mac a _ b _ c), in addition to the (mac arg ...) that was already supported. Most tests were passing, but there were some failures on tests that were formerly passing and which seemed unconnected to the new "foreign macro" additions.

Specifically, all tests for the parameterized separation prism were failing with an error resembling:

; vs: undefined;
;  cannot reference an identifier before its definition

The implicated expansion rule is:

[(_ ((~or (~datum β–³) (~datum sep)) onex:clause))
   #'(Ξ» (v . vs)
       ((flow (~> β–³ (>< (apply (flow onex) _ vs)))) v))]

(btw, "onex" is a historical name at this point that originally referred to "on expression," since the flow language was originally expressed via the on macro rather than flow which was simply a helper at that point. I may eventually rename these to flo to be more clear.)

The little that I knew about hygiene seemed inadequate to explain what was happening here so I watched this talk by Matthew. From what I now understand, I think that what may be happening here is:

(1) it is treating apply as a macro (due to the keyword argument thing) and expanding it via the rule for "foreign macros." (2) The foreign macros rule matches the input syntax to the pattern (mac pre-arg ... (~datum _) post-arg ...), deconstructs it, and then recomposes a lambda with the template arguments filled in. (3) Due to the "flip scopes" behavior of binding scope set expansion (32:45 - 34:35 at that talk link), any component of the input syntax that is present in the output syntax would retain the original, unmodified, set of scopes from the use site. But anything else is considered to be fresh syntax produced by the macro and so gets a distinct scope unconnected with the original one. In (apply (flow onex) _ vs)) (and this part I'm not entirely sure about), since vs matches the pattern (post-arg ...) anonymously -- i.e. not directly as arg but via the catch-all post-args ... pattern -- the vs produced by the macro is not treated as being the same as the vs in the input syntax but as a distinct identifier in the macro-local scope. I don't know, is this right? I'm increasingly starting to doubt that matching by pattern vs matching exactly could have anything to do with it. But it's my best guess at the moment.

If that is at least approximately what's going on here, then that would imply: (1) If foreign macros employ use-site bindings syntactically, they will not be recognized if the macros are picked up by the general foreign macros rule since their expansions will treat those identifiers as distinct from the use site bindings (2) If the above explanation is correct, then for such cases, users could still write custom Qi macros for each such macro that they'd like to use in this way, since such custom macros can match the expected identifiers exactly rather than via a catch-all pattern like pat ...

I will continue investigating. For now, I've simply excluded apply from the foreign macros rule, and this causes apply to be treated as a function and the above tests to pass. But depending on what we find is actually going on here, of course, this may or may not be a real solution.

In the meantime, the other issue:

(B) I suspect that the test failures involving the sort function defined in the tests are caused by:

(1) the application of sort being expanded using the foreign macros rule for the same reason as apply (viz keyword args) (2) when the sort form is expanded via:

[(_ (mac arg ...))
   #:when (and (procedure? (syntax-local-value #'mac (Ξ» () #f)))
               (not (eq? 'apply (syntax->datum #'mac))))
   #:do [(define threading-side (syntax-property this-syntax 'threading-side))]
   (if (and threading-side (eq? threading-side 'right))
       #'(Ξ» (v) (mac arg ... v))
       #'(Ξ» (v) (mac v arg ...)))]

..., for some reason, mac gets bound in the macro definition scope rather than use site scope. As a result, even though sort at use site referred to the private function in the tests, the expanded version here binds sort to the built-in sort that is in scope at the definition site, causing the tests to fail since this function has a different signature. Yet, I don't see why mac in the expansion being matched to sort doesn't just inherit the scope sets from the use-site sort that it matched.

In light of these, I'm thinking we could sidestep some of these issues if we could only distinguish functions accepting keyword arguments from true macros. Another possibility is to break hygiene in the foreign macro rules. Anyway, I'll continue looking into some of the unknowns identified above. Just thought I'd give you an update in case you had any thoughts.

michaelballantyne commented 2 years ago

See my PR re: (A).

I think (B) is more of a design problem than a bug: (~>> (m a b c)) for a macro assumes that there is one argument position to be threaded on the right. So when your sort is treated as a macro because it uses keywords, the generated flow only expects one argument.

In contrast for procedures, any number of values may be threaded as arguments on the right. There isn't a way to do this syntactically. If you want this behavior to work for keyword functions we'll need to detect them as you suggest. There isn't a public / stable way of doing this, but if you're willing to break into the internals and risk instability you might be able to use kw-expander? provided by racket/private/kw.

However, there may be other instances of things that look like procedures but are really syntaxes. For example:

(define/contract f (-> integer? integer? integer? integer?)
  (lambda (x y z) x))

((flow (~>> (f 1)))
 2 3)

Right now Qi works okay here because it's checking procedure? on the syntax-local-value; this one uses a set!-transformer instead. I guess perhaps the fact that it's a set!-transformer is a useful heuristic that it should be treated as procedure-like... but I'm not sure if that will work all the time.

Anyway, others might come up that I haven't thought of yet. It seems like it's a bit of a dangerous game to have these two different behaviors distinguished by checking binding information. But it does produce the nicest looking syntax!

countvajhula commented 2 years ago

Those are some great points. We need a way to detect macros that does not accidentally detect functions. In that same talk by Matthew (which evidently I've hardly begun to understand), he mentions a "compile time environment" mapping bindings to meanings. Would it be possible to do a lookup in this environment to see whether a particular binding will be a function at runtime? If this is possible, then it could be the first check to perform in all macro-related checks, and if true it would always reject this expression so that it's handled in the normal course as a function. If it is not true, then we could additionally check for syntax-local procedure? and any other checks (e.g. some other case like set!-transformer that wouldn't answer to procedure? but which should be handled as a macro) for foreign macro handling.

Re: the dangerous game, yeah I definitely agree. I would say that if a foreign macro doesn't work as expected, that is less of a concern than if a runtime function doesn't work as expected. If the former case has any oddities these could be documented, to the effect, "consider writing your DSL in Qi instead, to avoid these macro issues and derive the full syntactic range of the language." But of course, ideally we could get away without needing to make compromises here.

michaelballantyne commented 2 years ago

he mentions a "compile time environment" mapping bindings to meanings. Would it be possible to do a lookup in this environment to see whether a particular binding will be a function at runtime?

The "compile-time environment" is exactly what we're accessing with syntax-local-value. The trick is that the bindings we're running into really aren't procedures at runtime: they're syntaxes that redirect uses to another thing that will be a procedure. So a keyword function definition for f defines a syntax f and a function f' where expansions of f expand to uses of f'.

countvajhula commented 2 years ago

Ah okay, I see. Yeah that is indeed tricky, if the "function" truly is syntax in its heart of hearts. I can think of a few options:

A. Enumerate the different conceptual cases where we know that something that is secretly a macro is really intended to be function-like (e.g. keyword argument handling is one, and set-transformer I guess could be another though that does fail the current foreign macro check). Then identify a reliable test for each such case add it as a guard clause in the foreign macro rule. B. Live with "functions" being treated as macros and being limited to one implicit argument. C. Abandon foreign macro detection from within the flow macro, and fall back to another option like define-qi-threadable-syntaxes so that foreign macros are supported on a "registration-based" basis.

(A) is the best of both worlds. (B) and (C) represent compromises on either side of the function/macro divide.

For (A), as you pointed out earlier, one issue is that there may be other cases we don't know about or which don't exist yet and may appear at some point in the future. On the other hand, anything other than the ones we know about are likely to be edge cases in practice, and if they come up, it may not be so bad to add another guard clause to catch each newly reported case.

The other issue is that there may not be reliable tests available for these cases. For this, one option is to submit an issue on the Racket repo requesting this as a feature, or e.g. that the existing "tells" that we find be made stable / supported. Of course, there are no guarantees that this would pan out, but if it did, then this could make (A) compelling.

For (B), a major issue I see is that the functions that are treated as macros aren't actually macros from the user's perspective, and they would have no reason to expect that the "function" would not work with more than one argument. We could document this behavior, but this seems like an extra warning sticker that users would need to worry about, and which they may not know about until after they waste a lot of time being confused.

For (C) I may have been mistaken earlier when I said that define-qi-threadable-macros would have to encode either right- or left-threading. I think these macros would have access to the same syntactic information as the flow macro does internally, so they should be able to check the syntax property here just as well as the flow macro does. This should make this option on par with foreign-macro detection as far as the end result for specific foreign macros. The drawback of course being that it is a little more manual and less magical.

Of these, I think my preference is between (A) and (C).

(A) would obviously be the most magical (in a good way), but also the one with the most unknowns. It seems to entail at least (1) incorporating the kw-expander? check, and (2) requesting that it be formally supported in Racket going forward. If both of these pan out, the balance of risk and reward would arguably, tentatively, be swung in favor of (A).

Between (B) and (C), if we had to choose between syntactic convenience in all cases and focusing on functional programming, then I think the latter choice would make more sense for Qi (and the former, for the threading library), i.e option (C) over (B). For cases where there are a large interface between foreign macros and Qi, where define-qi-threadable-syntaxes may seem like a lot of manual work, arguably the right integration there is for the DSL (presumably) to be rebased onto Qi, so it could even be that supporting this case in a convenient way via (B) would reduce the incentive to explore what could be a better solution (i.e. rebasing as a Qi DSL).

WDYT?

(This makes me wish for actual "first class" macros for no reason I can put my finger on. There was this talk someone shared a little while back about "first-class environments" -- I wonder if that could make some of these things more straightforward).

countvajhula commented 2 years ago

Btw I noticed this:

(require qi)
(require relation) ; raco pkg install relation
(require racket/function)

(~> (1 1 3 1) (= #:key identity))

That = from the relation library is variadic, accepting any number of arguments, and it also accepts a #:key keyword argument. It seems like it should be detected by the foreign macro check and therefore should not support multiple arguments, but it looks like it in fact fails the macro check and is handled as a function. I also tried it with and without passing in the keyword argument, and with __. It still works in all of these cases.

countvajhula commented 2 years ago

I tried this experiment:

(require racket/private/kw)
(define-syntax-parse-rule (mac id)
  #:with result (kw-expander? #'id)
  (quote result))

Then:

> (mac apply)
#f
> (mac sort)
#f
> (mac =)
#f

I also tried it with #:with result (kw-expander? (syntax-local-value #'id)) and get the same results.

michaelballantyne commented 2 years ago

I expect apply to be a macro but a special one, not a keyword expander. The other cases look like they don't match as keyword expanders because they're wrapped by contract-out. I expect this is why = in your previous message works.

A locally defined keyword procedure does seem to match:

#lang racket/base

(require (for-syntax syntax/parse
                     racket/base)
         racket/private/kw
         syntax/parse/define)

(define-syntax-parse-rule (mac id)
  #:with result (kw-expander? (syntax-local-value #'id))
  (quote result))

(define (f x #:y [y #f])
  y)

(mac f)
countvajhula commented 2 years ago

@michaelballantyne I think this PR is just about ready, mind taking a look whenever you can? I'm going to prepare an announcement for the new feature introduced here, probably as a blog post on extensible DSLs or something, so that may take some time - maybe a week - and consequently there is no rush on reviewing this 😸

The main changes from the last time we talked:

Also, I noticed something a bit unusual and wanted to run it by you to see if it might be a bug or if I'm just doing something wrong, or if it's intended behavior.

After registering foreign macros using (define-qi-foreign-syntaxes ...), in order to use them in another module, if we provide them like so: (provide (for-space qi mac))), then it gives the following error:


; /Users/siddhartha/work/lisp/racket/sandbox/modules/binding-spaces/7.rkt:6:8: double-me: unbound identifier
;   in: double-me
;   context...:
;    #(5905997 local) #(5905998 intdef) #(5906005 local) #(5906006 intdef)
;    [common scopes]
;   other binding...:
;    #(double-me.1 #<module-path-index:"6.rkt" + '|7|[5349457]> 0)
;    #(-93270 interned qi) [common scopes]
;   common scopes...:
;    #(5905968 module) #(5905975 module 7)
; Context (plain; to see better errortrace context, re-run with C-u prefix):
;   /Users/siddhartha/.emacs.d/straight/build/racket-mode/racket/syntax.rkt:66:0

But if we additionally provide the bindings also in the default space, (provide (for-space qi mac) mac), then it works fine. Is this related to this note in the docs:

"By convention, when an identifier is bound in a space, a corresponding identifier also should be bound in the default binding space; that convention helps avoid mismatches between imports or mismatches due to local bindings that shadow only in some spaces."

That doesn't say anything explicitly about providing for space but maybe it's related? I'm actually not sure I understand what this note is saying -- is it saying that we need to create a default space binding for any identifier bound in another space, if one doesn't already exist, something like (define mac (void))? With Qi macros defined just with define-qi-syntax-rule, I don't recall running into this issue, so maybe it has something to do with define-qi-foreign-syntaxes specifically. Could the difference be that it's because in the former case I didn't create default space bindings? For now, I've just documented this behavior -- that foreign macros may need to be provided in both spaces.

Here are test modules that reproduce the issue:

1.rkt

#lang racket

(provide (for-space qi double-me)
         double-me)

(require qi)

(define-syntax-rule (double-me x) (* 2 x))

(define-qi-foreign-syntaxes double-me)

(module+ main
  (~> (5) double-me))

2.rkt

#lang racket

(require qi
         "1.rkt")

(~> (5) double-me)
michaelballantyne commented 2 years ago

I expect I'll finally get a chance to investigate and review on Wednesday. Sorry to have disappeared for a bit!

On Fri, Mar 4, 2022 at 10:47 PM Siddhartha Kasivajhula < @.***> wrote:

@michaelballantyne https://github.com/michaelballantyne I think this PR is just about ready, mind taking a look whenever you can? I'm going to prepare an announcement for the new feature introduced here, probably as a blog post on extensible DSLs or something, so that may take some time - maybe a week - and consequently there is no rush on reviewing this 😸

The main changes from the last time we talked:

  • renamed define-qi-threadable-syntaxes to "foreign" syntaxes since it isn't specific to threading
  • docs for the macro registration process
  • support for foreign macros being used in identifier form i.e. (~> (5) double-me)
  • show a helpful error when a foreign macro is invoked with the catch-all __ template
  • document a special case where esc can be omitted, which I accidentally discovered

Also, I noticed something a bit unusual and wanted to run it by you to see if it might be a bug or if I'm just doing something wrong, or if it's intended behavior.

After registering foreign macros using (define-qi-foreign-syntaxes ...), in order to use them in another module, if we provide them like so: (provide (for-space qi mac))), then it gives the following error:

; /Users/siddhartha/work/lisp/racket/sandbox/modules/binding-spaces/7.rkt:6:8: double-me: unbound identifier

; in: double-me

; context...:

; #(5905997 local) #(5905998 intdef) #(5906005 local) #(5906006 intdef)

; [common scopes]

; other binding...:

; #(double-me.1 #<module-path-index:"6.rkt" + '|7|[5349457]> 0)

; #(-93270 interned qi) [common scopes]

; common scopes...:

; #(5905968 module) #(5905975 module 7)

; Context (plain; to see better errortrace context, re-run with C-u prefix):

; /Users/siddhartha/.emacs.d/straight/build/racket-mode/racket/syntax.rkt:66:0

But if we additionally provide the bindings also in the default space, (provide (for-space qi mac) mac), then it works fine. Is this related to this note in the docs https://docs.racket-lang.org/reference/syntax-model.html#%28tech._binding._space%29 :

"By convention, when an identifier is bound in a space, a corresponding identifier also should be bound in the default binding space; that convention helps avoid mismatches between imports or mismatches due to local bindings that shadow only in some spaces."

That doesn't say anything explicitly about providing for space but maybe it's related? I'm actually not sure I understand what this note is saying -- is it saying that we need to create a default space binding for any identifier bound in another space, if one doesn't already exist, something like (define mac (void))? With Qi macros defined just with define-qi-syntax-rule, I don't recall running into this issue, so maybe it has something to do with define-qi-foreign-syntaxes specifically. Could the difference be that it's because in the former case I didn't create default space bindings? For now, I've just documented this behavior -- that foreign macros may need to be provided in both spaces.

Here are test modules that reproduce the issue:

1.rkt

lang racket

(provide (for-space qi double-me)

     double-me)

(require qi)

(define-syntax-rule (double-me x) (* 2 x))

(define-qi-foreign-syntaxes double-me)

(module+ main

(~> (5) double-me))

2.rkt

lang racket

(require qi

     "2.rkt")

(~> (5) double-me)

β€” Reply to this email directly, view it on GitHub https://github.com/countvajhula/qi/pull/16#issuecomment-1059676335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46UZIUBSD4P5Y5P25UR3U6LKLXANCNFSM5LK4C4EA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 2 years ago

Totally fine @michaelballantyne ! I haven't had a chance to work on that blog post anyway.

On my end, I briefly looked at the source for deta and sawzall just to get an idea of whether these languages could easily be ported to Qi. deta uses analogues of define-qi-syntax-rule and define-qi-syntax-parser so that seems like it would be straightforward. But I noticed that Sawzall uses syntax-parse directly rather than one of the macro definition forms. Do you think Qi ought to provide parse-qi-syntax and qi-syntax-parser (analogous to syntax-parse and syntax-parser) to neatly map to these cases? My initial attempts at implementing these ran into some errors so I thought I'd get your thoughts on whether it's needed before trying again. I think providing these would amount to just exposing the qi-macro compile time datatype which would otherwise not be available via just syntax-parse and syntax-parser, and it could potentially minimize duplication in the qi macro definition forms in qi/macro. Yet, even if this works (I was running into issues with the qi-macro type being defined for-syntax but being needed at a different phase when syntax-parser is used in define-qi-syntax-parser, I think - not sure), it would still rely on the user defining the resulting binding in the Qi binding space (i.e. which forms like define-qi-syntax-rule do). Which, I suppose that's fine. Anyway, curious what you think.

Also, I've updated Qi's probe debugger to use the new macros -- this simplifies what was already a pretty simple interface, so, that's great! I did run into this weird error building docs, though:

raco setup: error: during building docs for <pkgs>/qi-doc/scribblings/qi.scrbl
raco setup:   examples: exception raised in example
raco setup:     error: "eval:2:0: ?: access disallowed by code inspector to unexported transformer\n  from module: \"/home/runner/work/qi/qi/qi-probe/probe.rkt\"\n  at: readout\n  in: readout.1"

... where readout is a Qi macro defined in qi-probe/probe.rkt and provided for-space as usual. I tried defining a dummy (define readout (void)) and providing that in the default space just to see if that would make a difference, but it didn't.

This error goes away if I use @racketblock[...] instead of @examples[...] in the docs. The latter would be preferable obviously but it's not a huge deal to use the former. Would be nice to understand why this happens though. Any ideas here?

michaelballantyne commented 2 years ago

Also, I noticed something a bit unusual and wanted to run it by you to see if it might be a bug or if I'm just doing something wrong, or if it's intended behavior.

Here are test modules that reproduce the issue:

I wasn't able to reproduce the problem with either Racket 8.3 or a more recent build from git. What version of Racket are you using? Anything special I have to do beyond running those files?

michaelballantyne commented 2 years ago

I wasn't able to reproduce the problem...

Whoops, sorry. Didn't read carefully enough. I do see the problem when I remove the provide of double-me outside the for-space.

michaelballantyne commented 2 years ago

This looks like the relevant line of the qi-foreign-syntax-transformer:

https://github.com/countvajhula/qi/blob/b192a8c823c0bb94fead97dc9669c9ee508c5822/qi-lib/macro.rkt#L84

Note that the name pattern variable here contains the identifier for the occurrence of double-me in 2.rkt. Only the qi-macro binding of double-me in the qi space is imported in that module! So when the identifier is used in (name v) in the expansion, it can't be resolved to the original macro.

The right solution here is to capture a copy of the occurrence of double-me in (define-qi-foreign-syntaxes double-me) from 1.rkt. That identifier is in scope of the original macro definition, so that's what we want to use in the expansion.

You could turn qi-foreign-syntax-transformer into:

(define (make-qi-foreign-syntax-transformer original-macro-id)
    (define/syntax-parse original-macro original-macro-id)
    ...
       #'(esc (lambda (v) (original-macro v)))
    ...)

(changing the expansions using name in other clauses as well)

And in define-qi-foreign-syntaxes, pass along the original macro id:

(define-syntax spaced-form-name (make-qi-foreign-syntax-transformer #'form-name))
countvajhula commented 2 years ago

Ah, nice work finding that! I'll take a closer look at your solution soon. I'll have some time today but more time tomorrow/Friday.

countvajhula commented 2 years ago

@michaelballantyne works like a charm! Added the fix and some tests to catch it.

michaelballantyne commented 2 years ago

I did run into this weird error building docs, though

I think this is a bug in the interaction of code inspectors and binding spaces. I opened a ticket: https://github.com/racket/racket/issues/4186

michaelballantyne commented 2 years ago

Do you think Qi ought to provide parse-qi-syntax and qi-syntax-parser (analogous to syntax-parse and syntax-parser) to neatly map to these cases?

Simply providing qi-macro (just the constructor, not exposing accessors, etc) would provide the most flexibility.

Note also that because of the slightly funky way binding spaces work, Qi macros don't have to be defined in the Qi binding space! If a name is unbound in the Qi space but bound to a qi-macro in the default space, this line will still find it:

#:when (qi-macro? (syntax-local-value space-m (Ξ» () #f)))

It certainly makes the most sense to use the binding space for macros that use the same name as Racket forms, such as and, and for Qi versions of foreign syntaxes. But in other cases it isn't necessary.

countvajhula commented 2 years ago

I did run into this weird error building docs, though

I think this is a bug in the interaction of code inspectors and binding spaces. I opened a ticket: racket/racket#4186

Thank you for finding that! I'll keep it as racketblock instead of examples in the docs for now.

countvajhula commented 2 years ago

Do you think Qi ought to provide parse-qi-syntax and qi-syntax-parser (analogous to syntax-parse and syntax-parser) to neatly map to these cases?

Simply providing qi-macro (just the constructor, not exposing accessors, etc) would provide the most flexibility.

Note also that because of the slightly funky way binding spaces work, Qi macros don't have to be defined in the Qi binding space! If a name is unbound in the Qi space but bound to a qi-macro in the default space, this line will still find it:

#:when (qi-macro? (syntax-local-value space-m (Ξ» () #f)))

It certainly makes the most sense to use the binding space for macros that use the same name as Racket forms, such as and, and for Qi versions of foreign syntaxes. But in other cases it isn't necessary.

That certainly makes things a lot easier. I've exposed qi-macro (only the constructor) and also added a define-qi-syntax form to define the binding in the qi binding space specifically.

countvajhula commented 2 years ago

@michaelballantyne I've made the changes you suggested re: the monad example, and reporting syntax errors at compile time. I also added more docs about writing languages in Qi. It includes more stuff from your paper re: the difference between hosted and embedded languages and which may be more appropriate in different cases. Please let me know if I got anything wrong there or if you feel there's more that could be added that would be useful. Anyhow, this is ready for another review pass 😊

countvajhula commented 2 years ago

Ok, made the change πŸ‘

Thank you for all of your help on this @michaelballantyne , this major improvement to Qi would not have been possible without you πŸŽ–οΈ πŸ† πŸ™ . I'm very psyched about this. I will begin release preparations for this soon!