drym-org / qi

An embeddable flow-oriented language.
59 stars 12 forks source link

Cover compile time errors in tests #86

Closed countvajhula closed 10 months ago

countvajhula commented 1 year ago

Summary of Changes

Qi has never quite managed to reach 100% test coverage because there are some parts of the code that are only hit at compile time (e.g. syntax errors), which I didn't know how to write unit tests for. Michael recently mentioned convert-compile-time-error which converts a compile-time error into a runtime error, allowing us to write unit tests for it. Now with this available, we should be able to get to 100% test coverage.

Public Domain Dedication

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

countvajhula commented 10 months ago

This is ready for review. We are now almost at 100% coverage! There are just a few lines missing in deforest.rkt. For some of them, it looks like they aren't being hit right now when we try expressions like (~>> (1 2 3 4) (range _) (filter odd?) (map sqr)) in the REPL. I'm not sure if that is intentional.

In increasing coverage, it turned out that a lot of lines that weren't covered were indicative of a real problem, so this PR includes some of those fixes as well, and it also uncovered a potential problem:

It looks like we aren't deforesting nested positions, so something like (☯ (>< (~>> (filter odd?) (map sqr)))) is not being deforested. I've added a failing test that shows this. My impression is that since we are using find-and-map/qi in deforest-pass, it should match and transform nested positions. The unit tests for find-and-map/qi in qi-test/tests/compiler/util.rkt seem to confirm this.

You can get a coverage report locally by running make cover btw.

Any input appreciated!

countvajhula commented 10 months ago

So, I ran into the issue that @dzoep had a little while ago suspected might be present, which is, premature termination of compiler passes upon receiving, or not receiving, a false return value.

The commit message fixing the issue explains it well:

Both `fix` and `find-and-map` apply the same type of function (i.e. a
compiler rewrite rule) to syntax, but they have incompatible
expectations about the return value. Specifically, `fix` terminates on
a false return value, while `find-and-map` continues. This reconciles
them so that they both terminate upon receiving false, and both
continue if the transformed syntax is identical to the original.

Unfortunately, there is now a weird failing test, and I don't understand what is going on with it. I added some logs to normalize-rewrite to see what sequence of expressions it is receiving, and weirdly enough, for this input expression:

(thread tee collect)

... somewhere in the course of traversing the expression using find-and-map/qi and applying normalization-rewrite (even if we remove fix), it seems to pass this expression to normalization-rewrite:

(tee collect)

This doesn't look right since it isn't a true syntax node in the input expression but more of a "sublist." In any case, once it receives this syntax, it matches this normalization rule:

    ;; trivial tee junction
    [(tee f)
     #'f]

... which results in the containing expression becoming:

(thread . collect)

... resulting in the error:

; .../qi/qi-test/tests/flow.rkt:1562:3: thread: bad syntax
;   in: (thread . collect)

The debug logs I added show this sequence of syntax objects passed to normalize-rewrite:

input syntax to normalization is #<syntax:flow.rkt:1562:3 (thread tee collect)>
#<syntax:flow.rkt:1562:3 (thread tee collect)>
#<syntax:/Users/siddhartha/work/qi/qi-lib/flow/extended/expander.rkt:43:5 thread>
#<syntax:.../compile/syntax-spec.rkt:101:11 (tee collect)>
#<syntax:flow.rkt:1562:10 collect>
; /Users/siddhartha/work/lisp/racket/qi/qi-test/tests/flow.rkt:1562:3: thread: bad syntax
;   in: (thread . collect)

For comparison, when I manually evaluate all the necessary functions and then invoke them like so:

> (find-and-map/qi normalize-rewrite
                   #'(thread tee collect))
#<syntax:flow.rkt:1688:29 (thread tee collect)>
#<syntax:flow.rkt:1688:30 thread>
#<syntax:flow.rkt:1688:37 tee>
#<syntax:flow.rkt:1688:41 collect>
#<syntax:flow.rkt:1688:29 (thread tee collect)>

... the output looks sensible. In particular, it never passes (tee collect) to normalize-rewrite.

One unusual thing in the traced output is that the offending (tee collect) syntax seems to come from .../compile/syntax-spec.rkt. I'm not sure I understand how this module could be involved here...

In case you're back from vacay @michaelballantyne , would love your thoughts! ⛷️ 😼

benknoble commented 10 months ago

I seem to recall a (cdr (syntax->list stx)) in one of the syntax parsers; that could be why it’s getting the sublist? So maybe that function needs to always return a syntax list (so the tee would return #'(f)), or else that function should not do the cdr madness?

benknoble commented 10 months ago

Related: isn’t (~> -< collect) a bit of a syntax error? I don’t think I’ve ever seen a tee without child flows.

countvajhula commented 10 months ago

I seem to recall a (cdr (syntax->list stx)) in one of the syntax parsers; that could be why it’s getting the sublist? So maybe that function needs to always return a syntax list (so the tee would return #'(f)), or else that function should not do the cdr madness?

Good idea. Do you mean this one? That is happening at the code generation stage (i.e. qi0->racket), whereas I believe this error is happening during normalization, i.e. in the optimize-flow stage, so this seems unlikely to be the cause.

I'm also suspicious of this part of find-and-map, where we apply a transformation to the syntax list during tree traversal.

Related: isn’t (~> -< collect) a bit of a syntax error? I don’t think I’ve ever seen a tee without child flows.

Yeah I didn't recognize it at first either 😆 . But then I remembered that we'd added support for -< to treat its first input as a "control input" specifying the number of tines to create when used in identifier form. This was to allow it to serve as a core form that fanout could compile to, IIRC.

benknoble commented 10 months ago

I'm also suspicious of this part of find-and-map, where we apply a transformation to the syntax list during tree traversal.

That's almost definitely the problem, but I don't see an easy fix.

Does everything still work if the cons match pattern is gone? It seems to basically say "if the input is a list, run find-and-map on the head and the tail." Why/where do we use that power? I assume it's to traverse deep trees or something… but in this case we need to traverse the tree while maintaining some context? Or perhaps it would be better to write [(? list? xs) (map (curry find-and-map f) xs)] so that we run the function on each child node, rather than on the head & tail (since the tail doesn't seem to make much sense?)?

countvajhula commented 10 months ago

Thanks for the detailed review!

Looking at find-and-map again, it actually does look right to me. Even though we're getting the cdr of the syntax list and calling find-and-map recursively on it, the pattern matching ensures that it will only apply the actual transforming function to syntax, which should only ever be the car position (the cdr should keep recursing structurally) (i.e. (f stx) on this line).

What's weird is that this test for normalization of the core language passes (normalization is mostly all datum rather than literal matching), and adding a debug log at the top of normalize-rewrite to print the input syntax shows the correct sequence of syntaxes being passed to it (cf. my earlier comment about this bug). But when we run effectively the same test using the surface language (i.e. the currently failing test), it improperly passes the sublist to normalize-rewrite and breaks.

That could imply that (tee collect) is answering affirmatively to syntax? in find-and-map. I'll see if I can verify that. I'll also try modifying/removing the cons pattern and see what happens.

countvajhula commented 10 months ago

I'll continue looking into (1) private submodule, (2) pattern jumbling, (3) find-and-map, (4) testing ideas. But I'll go ahead and merge this PR now in case @dzoep wants to experiment with things.

Thanks!