Nonterminal blues - Githubissues

countvajhula commented 11 months ago

Summary of Changes

This began as an investigation of the long-functional-pipeline benchmark, which it turned out wasn't getting deforested. That seems to have something to do with the fact that that benchmark is rewritten by both of our current optimization passes, normalization and deforestation, and most likely, the latter doesn't see the nonterminal syntax property ~for reasons that are somewhat mysterious~ because we weren't explicitly propagating it, and the syntax property being "nonpreserved" meant that our tests were not correctly handling this property across phases (see comments below).

In the course of investigating I ended up refactoring a lot of tests, and they actually revealed a lot of other smaller issues that are mostly fixed. ~But the main nonterminal property issue remains a mystery~. Summary below:

The tests pass when using racket, but fail when using racket -y. In the latter case, the nonterminal property is not present after expansion.
We have a failing test in compiler/rules.rkt that passes if we manually introduce the nonterminal property between passes viatag-form-syntax
Yet, adding the property to the compiler.rkt module itself, at the conclusion of normalize-pass, does not make a difference to the test result. Adding the property at the find-and-map/qi level (after excluding non-syntax and expressions tagged #%host-expression -- which it didn't like) doesn't make a difference to the test result either.

Other changes:

Use on-demand expansion pervasively in the tests instead of hand-writing core expressions. This is much better and revealed some bugs in normalization
Added some macros that make the tests much easier to read and write (and this revealed that many tests were actually identical (copypasta?) even though their descriptions explained what they were supposed to test - fixed)
Added a failing test to reveal the new multi-pass bug with propagating the nonterminal property
Rename a util module to (compiler) pass. There are too many modules named "util" and obviously we should avoid such a generic name.

Public Domain Dedication

[x] In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

michaelballantyne commented 11 months ago

The problem occurs because of an interaction between the way that the tests are written and the fact that we made the nonterminal syntax property non-preserved.

When applying a syntax property with syntax-property, there's an optional third argument that indicates whether the property should be preserved in serialized syntax, i.e. when a compiled file is written out by raco make or racket -y.

If a syntax property is really only needed temporarily for compilation, making it non-preserved can avoid increasing the size of compiled files. So as of right now we made the nonterminal property non-preserved.

In the tests as written, phase0-expand-flow expands the initial syntax at expansion time of the test module, but the remainder of the test (eg. calling deforest-pass and deforested?) happens at run-time. That is, this test:

(deforested? (syntax->datum
              (deforest-pass
                (phase0-expand-flow
                 #'(>< (~>> (filter odd?) (map sqr)))))))

Expands to something like:

(deforested? (syntax->datum
              (deforest-pass
                #'(amp
                   (thread
                    (#%blanket-template
                     ((#%host-expression filter)
                      (#%host-expression odd?)
                      __))
                    (#%blanket-template
                     ((#%host-expression map)
                      (#%host-expression sqr)
                      __)))))))

and then that runs at runtime. The problem is that because the nonterminal property is not preserved, when running with racket -y the property is lost after expansion time and is thus missing when running deforest-pass at runtime.

I suggest instead running the entire compiler pipeline at expansion time, and only returning a boolean to runtime. This makes sure that the compiler runs in the same way it will in a real program---at phase 1, and without any step of serialization between expansion and compilation. Here's a test that works file with both racket and racket -y:

#lang racket/base

(require
  (for-syntax racket/base racket/string)
  qi/flow/core/compiler
  qi/flow/core/deforest
  ;; necessary to recognize and expand core forms correctly
  qi/flow/extended/expander
  ;; necessary to correctly expand the right-threading form
  qi/flow/extended/forms      
  (submod qi/flow/extended/expander invoke)
  syntax/macro-testing
  rackunit)

(begin-for-syntax
  (define (deforested? exp)
    (string-contains? (format "~a" exp) "cstream")))

(check-true
 (phase1-eval
  (deforested? (syntax->datum
                (deforest-pass
                  (expand-flow
                   #'(>< (~>> (filter odd?) (map sqr)))))))))

countvajhula commented 11 months ago

@michaelballantyne Ok, thanks for the explanation! That definitely helps to understand what's going on. I'll have to think about how to refactor the tests to do the checks in phase 1, and I think if I can get this test to pass, that should help reveal how to address the original problem with the long-functional-pipeline benchmark.

countvajhula commented 11 months ago

I've fixed that specific test to run in phase 1, verified that it is capable of detecting the failure to propagate nonterminal across normalization → deforestation, and then added the fix, which was to simply attach the property to the top level flow expression that is being compiled.

I'm assuming that:

Since it is a toplevel expression, it is safe to just reattach the nonterminal property after normalization, without first checking whether the expansion had that property attached to begin with.
Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Does that sound right?

One general impression is that although the addition of the syntax property appeared a simple solution at first, it does make things a bit harder to reason about. We should revisit this at some point (after the release 😄)

Now I need to push the remaining tests into phase 1, and I think the PR should be ready at that point.

One "Interesting" thing @dzoep : long-functional-pipeline now does get deforested (yay!), but it's only about 1.5x faster than Racket, when I think we are expecting about length-of-the-pipeline times faster, in this case 6-7, depending on whether values is normalized away by Racket as we do it).

benknoble commented 11 months ago

Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Isn't find-and-map pretty crucial to the compiler or expander? I'm actually very shocked it needs to use datum->syntax this way, then, and think that suggests a possible [architectural? design? implementation?] flaw (because I would expect Racket macros and other compile-time code to not need that kind of low-level tool in most cases).

I'll have to really study the code [again] to figure out why it's needed though, so I'm mostly saying this now as a way of "please publicly hold me accountable to investigate after the release."

benknoble commented 11 months ago

I'm also not seeing any changes to find-and-map in this PR; was that in a previous PR or am I missing something?

countvajhula commented 11 months ago

Well, I think find-and-map was a simple way to do a syntax tree traversal that happened to suffice for our needs (until we hit this particular issue!). My understanding is that the "right way" would be to do a proper core grammar aware tree-traversal, using something like syntax-parse (which I believe this TODO notes, and which I think you're getting at), but if we did that, we would essentially be duplicating the entire core language grammar in the compiler, which we've already notated in the expander. I think we're hoping that Syntax Spec would be able to infer such traversal utilities that could be used down the line (correct me if I'm wrong Michael) from the core grammar specification, to avoid this kind of duplication. And in that case, it would be unnecessary effort to implement such a Qi-specific traversal now since we would get it "for free" later anyway in generic form.

And yes, I should have clarified that this PR doesn't modify find-and-map, just normalize-pass which uses it.

I'll gladly hold you accountable to implement the traversal if Syntax Spec doesn't get around to it quickly enough 😝

michaelballantyne commented 11 months ago

I think a generic syntax traversal using datum->syntax is a totally reasonable thing to do here, but it should probably ultimately live in the syntax-spec library rather than in the Qi repo.

benknoble commented 11 months ago

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

michaelballantyne commented 11 months ago

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

It does not expose it, and currently the only thing it generates from the grammar is the macro expander so it also does not have a super clean internal representation of the grammar either. The syntax classes for matching the grammar declarations are here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/syntax-classes.rkt#L103

And the generation of the expander from the grammar is here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/compile/nonterminal-expander.rkt#L25

countvajhula commented 11 months ago

Yesterday we noticed that CI was failing, specifically on BC versions 8.5 through 8.8. We generally agreed that supporting BC is not a priority, so I've just updated the test matrix to reflect the actual current compatibility -- that is, BC/CS 8.9+, and CS all the way back to 8.5 (i.e. unchanged from before).

These are the specific errors, for reference, in case anyone thinks they might be cause for concern:

BC 8.5-8.7:

    write: cannot marshal value that is embedded in compiled code
      value: (srcloc #<path:/home/runner/work/qi/qi/qi-lib/flow/core/normalize.rkt> 74 7 2297 22)
      compilation context...:
       /home/runner/work/qi/qi/qi-test/tests/qi.rkt
      context...:
       /usr/share/racket/collects/compiler/private/cm-minimal.rkt:808:8

BC 8.8:

    mask-accessor: contract violation
    raco setup: 1 making: <pkgs>/rackunit-abbrevs/scribblings
      expected: mask?
      given: #f
      compilation context...:
       /home/runner/.local/share/racket/8.8-bc/pkgs/rackunit-abbrevs/private/test-typed-rackunit-abbrevs.rkt
      context...:
       /usr/share/racket/pkgs/typed-racket-lib/typed-racket/types/overlap.rkt:48:0: overlap?

I think this PR is ready to go! Any review appreciated. I'll aim to merge it soon if there are no further comments.

michaelballantyne commented 11 months ago

The first of those two errors seems like something we could potentially fix; it’s complaining that you are embedding a srcloc structure value in compiled code. We could probably avoid doing that easily.

The error message for the second suggests it’s a problem in the rackunit-abbrevs dependency, so less straightforward for us to change.

On Sat, Dec 30, 2023 at 2:33 PM Siddhartha Kasivajhula < @.***> wrote:

Yesterday we noticed that CI was failing, specifically on BC versions 8.5 through 8.8. We generally agreed that supporting BC is not a priority, so I've just updated the test matrix to reflect the actual current compatibility -- that is, BC/CS 8.9+, and CS all the way back to 8.5 (i.e. unchanged from before).

These are the specific errors, for reference, in case anyone thinks they might be cause for concern:

BC 8.5-8.7:

write: cannot marshal value that is embedded in compiled code value: (srcloc #<path:/home/runner/work/qi/qi/qi-lib/flow/core/normalize.rkt> 74 7 2297 22) compilation context...: /home/runner/work/qi/qi/qi-test/tests/qi.rkt context...: /usr/share/racket/collects/compiler/private/cm-minimal.rkt:808:8

BC 8.8:

mask-accessor: contract violation raco setup: 1 making: /rackunit-abbrevs/scribblings expected: mask? given: #f compilation context...: /home/runner/.local/share/racket/8.8-bc/pkgs/rackunit-abbrevs/private/test-typed-rackunit-abbrevs.rkt context...: /usr/share/racket/pkgs/typed-racket-lib/typed-racket/types/overlap.rkt:48:0: overlap?

I think this PR is ready to go! Any review appreciated. I'll aim to merge it soon if there are no further comments.

— Reply to this email directly, view it on GitHub https://github.com/drym-org/qi/pull/143#issuecomment-1872611088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK46U6EBUGMGXMQODFLL5TYMCCEJAVCNFSM6AAAAABBF6AQFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSGYYTCMBYHA . You are receiving this because you were mentioned.Message ID: @.***>

countvajhula commented 11 months ago

Yeah, if it's an easy fix we might as well do it. From googling the error, it sounds like it might be a case of "3d syntax" which I gather is best avoided as it loses the separate compilation guarantee, which I've come to really appreciate as the codebase gets larger!

countvajhula commented 10 months ago

I haven't looked into the srcloc issue yet, but so as not to hold up the other fixes in this PR, I've just modified the test workflow to run tests on BC 8.5 but not block other tests from running if it fails, so we see that the issue remains but it doesn't break CI. We can fix the remaining issue in a separate PR, so I'll merge this now!

drym-org / qi

Nonterminal blues #143

Summary of Changes

Public Domain Dedication