Open gelisam opened 2 years ago
@david-christiansen mentioned the following relevant literature:
We later determined that those resources are more relevant for #164 than for recursion-schemes. Or are they? We were discussing recursion-schemes when the topic of those two resources came up, so the two topics must be linked somehow, I just don't see how.
I think that levitation is not relevant to Klister. Dagand's thesis is relevant in the sense that it treats the elaboration of pattern matching to encodings of datatypes, which seems to be what you're proposing to do here, and there may be some notations or technical devices to take from it.
Right! And the other reference you mentioned, which elaborates pattern-matching on multiple arguments to nested pattern-matchings on a single argument, is Wadler's section on Case-Tree compilation, chapter 5 in SPJ's "The Implementation of Functional Programming Languages" book. The key, I believe, is to keep both a success continuation and a failure continuation so that we know what to do when a pattern matches and when it doesn't, as nested patterns and multiple arguments makes it more complicated than just trying the next pattern in a list.
We discussed the following implementation plan.
Define a my-datatype
macro which defines both a datatype Foo
and its base functor FooT
, and uses #168 to associate the two. Similarly associate each constructor MkFoo
of FooT
to the corresponding MkFooT
constructor in FooT
.
Define a my-defun
macro which checks for a few well-known recursion patterns and rewrites the clauses, using the association to convert between Foo
and FooT
as appropriate.
As the maintainer of recursion-schemes, I believe that the main reason recursion-schemes aren't used more often is simply because people are more comfortable with the unbounded-recursion style. As a result, they have to spend more time and mental energy when reading and writing code written in the recursion-schemes style than when it is written in the unbounded-recursion style. This is a cost which has to be paid on every single line of every single function, so of course it doesn't seem worth the cost when the benefits only appear on some lines of some functions.
What if it was possible to mix the two styles within the same function, so that the less familiar syntax is only used when it brings benefits?
Let's look at a concrete example of what this could look like. Consider the following program written in the unbounded-recursion style.
The code is clear enough, but it has two issues:
Lam
constructors, becauseusedVars1
is linear and is called once perLam
constructor.The following recursion-schemes version fixes both problems:
The way in which recursion-schemes fixes the two issues are:
TermF
datatype has generatedFunctor
,Foldable
, andTraversable
instances for manipulating the results of the recursive calls in a uniform way. This makes it possible to combine all the boring repetitive lines into a single, more abstract line.zygo
computes bothusedVars
anduseConstWhenPossible
at the same time. It does so in a way which makes the partial results ofusedVars
, that is, the vars which are used by each sub-term, available touseConstWhenPossibleF
. Thus, there is no need to make a linear-time call in order to obtain the information, it is already available for each sub-term.However, there are many parts of the recursion-schemes style code which looks needlessly-complicated, especially to those who are unfamiliar with the details:
zygo
,cata
, andembed
callsTermF (Set String, Term) -> Term
andTermF (Set String) -> Set String
typesI thus propose a
#lang
which achieves something like this (but using a parenthesized syntax of course):Note how we sometimes pattern-match on
Lam
, and other times onLamF
. The first indicates that we want to use the usual unbounded-recursion syntax, with explicit recursive calls, while the second indicates that we want the recursive positions to already contain the results of the recursive calls. The combined-but-more-abstract clauses, which don't match on a constructor, are assumed to expect aTermF r
, not aTerm
, because this form is more commonly-used in the recursion-schemes style.When using the
Lam
form, the#lang
implementation will look for recursive calls and rewrites them away, by converting the right-hand side from theLam
form to theLamF
form. If the recursive calls are in an unusual position in the code, the rewrite fails, just like a termination-checker who doesn't recognize that the recursive calls are made on smaller inputs. As a result of the rewrite,usedVarsF
anduseConstWhenPossibleF
are generated in addition tousedVars3
anduseConstWhenPossible3
.More ambitiously, the
usedVars3
call which leads to quadratic performance should also be rewritten away. To do this, the#lang
must remember thatusedVars3
has ausedVarsF
form which can be used in azygo
. The#lang
silently rewrites all the other clauses which don't useusedVars3
to expect aTermF r
rather than aTerm (Set String, r)
, thus relieving the user from having to explicitly pattern-match on(_, r)
or to usefmap snd
in the combined-but-more-abstract clause.Finally, something like #156 could be used to automatically insert missing calls to
embed
. In particular, theuseConstWhenPossible3 rewrittenSubTerms = rewrittenSubTerms
clause is producing aTermF Term
instead of aTerm
, and is thus missing a call toembed
, but in the recursion-schemes style,embed
is such a common and information-less transformation that it is basically line noise, and so it seems clearer to allow it to be omitted than to require it to be written explicitly.