Open AnthonyJacob opened 7 years ago
Actually Scott-encoding are pretty much as efficient as usual data-structures given a good implementation of closures (reference). I'm currently researching sharing-graph-based implementations, which could open the doors to massively parallel evaluators which, I hope, would make Moon and similar functional languages much faster.
Oh, I think you're talking about something different - about translating Nat
to actual machine ints? I think that is only really possible with types, and even so it is a bit tricky. That is the very reason Moon has Nat as a primitive (and why I claim that type systems would make that not necessary)...
Yep, I hoped that we could somehow think out such an algorithm, that would map lambda numbers to closures with integers inside. Then we would not need Number primitive. It would also compress other kinds of expressions that repeatedly apply same function.
Basically moon itself would be only
data MoonTerm
= App MoonTerm MoonTerm
| Lam MoonTerm
| Var Number
| Ref String
| Let String MoonTerm MoonTerm
| Fix String MoonTerm
then there would be second phase that optimizes it further into lower level language
data Term
= App Term Term
| Lam Term
| Var Number
| Ref String
| Let String Term Term
| Fix String Term
| Pri String [Term]
| Num Number
| Str String
| Flo Double
| etc
I think this is more or less my idealized version of Moon/Morte:
data Term
= App Term Term
| Lam Term Term
| All Term Term
| Let Term Term
| Fix Term
| Box Term
| Var Nat
| Set Nat
I think all of those constructors are necessary, but no more. App
and Lam
gives us abstraction (linear functions). All
is Pi
(i.e., ∀
), Set
is the type of types, both together give us an expressive type system. Let
and Box
gives us the power to duplicate things, which IMO is clearly a separate concept to that of functions, so deserves to be separated. Fix
allows us to express recursive types naturally, without making the resulting system inconsistent (because the consistency should probably follow from linear logic)! Var
is for substitution. All that is bruijn-indexed. IMO that system seems to have it all:
Simple (more constructors than Morte, but "simpler to reason"; ex: eta-reduction is trivial!)
Has Scott-encodings (a.k.a. the main problem with using Morte right now, IMO)
Can probably express induction, large elims, prove 1 != 0
, etc.
Linear types (no garbage collection, also compiling arrayish code to perform in-place writes!)
Consistent (is total, plus a term of type T
is an undeniable proof of its proposition)
Var names and comments could be annotated with (a => b => b) "name" term
Could easily compile Nats, Floats, etc. into a low level language as you described
Can be reduced with sharing graphs without an oracle, for theoretical parallel reducers
Compact binary encoding (exactly 3 bits per constructor!)
It doesn't feel arbitrary. Intuitively, each constructor does one thing that is obviously needed
Only reason I didn't start with it is I don't have time to investigate if all of the above actually holds. Intuitively I think it does, but intuition doesn't mean much, I guess. I'd also need to see if it would actually be useable in practice (linear types put a lot of limitations, but I think with proper tooling that'd be fine: remember the language is simpler to reason, so tools could do much more!). Plus it'd take a lot of time to do all that, and I have some immediate problems to solve that need a working Moon on the short term.
@Gabriel439 thoughts?
When you say "linear logic" and "linear types", are you referring to elementary-affine-typeable or true linear logic? The reason I ask is that I don't see any type-level constructors for explicitly marking inputs shareable (i.e. "Bang", for example)
Also, how do you ensure that this is strongly normalizing?
@Gabriel439 sorry for my bad namings, Box
was supposed to be Bang
, so I mean EAL. On normalization, I never looked it deeply, but I assume it is pretty easy to prove this is strongly normalizing given that each duplication consumes a !
, right? My point (and where you could disagree, I guess) is that this isn't nearly as restrictive as it seems, as it gives us pretty much bounded for-loops on steroids. In fact, probably much easier to work with than Morte, given that we're actually able to pattern-match now...
Bounding the number of !
s is not by itself enough to prevent normalization. For example, the following lambda expression infinitely loops with only two duplications:
(\x -> x x) (\x -> x x)
But there is no way to decorate that term with !
(given that a bang'd term can't have affine variables, and each exponential variable must have exactly 1 enclosing !
). Ex:
(\x. let !y=x in !(y y)) (\x. let !y=x in !(y y))
let !y=(\x. let !y=x in !(y y)) in !(y y)
Normal form, because (\x. let !y=x in !(y y))
has no !
. Another attempt:
(\x. let !y=x in !(y y)) !(\x. let !y=x in !(y y))
let !y=!(\x. let !y=x in !(y y)) in !(y y)
!((\x. let !y=x in !(y y)) (\x. let !y=x in !(y y)))
!(let !y=(\x. let !y=x in !(y y)) in !(y y))
Further, but still on normal form, same reason. Another one:
(\x. let !y=x in (y !y)) !(\x. let !y=x in (y !y))
(let !y=!(\x. let !y=x in (y !y)) in (y !y))
(\x. let !y=x in (y !y)) !(\x. let !y=x in (y !y))
Woops! Infinite loop, but the initial term won't type-check because the first y
on (y !y)
has 0 enclosing !
s.
From what I've heard, it is probably easy to prove this system consistent, the real question is whether this makes up for a "practical programming language for humans" (which I think isn't a good point - we want a great internal/universal format; with good tooling, even brainfuck can be made practical).
So I know that in a system without Fix
an expression that is EAL-typeable is strongly normalizing (since EAL-typeable is a restricted case of Hindley-Milner and Hindley-Milner-typeable expressions are strongly normalizing)
However, I'm still not aware of a proof that if you add unrestricted Fix
that an EAL-typeable term is still strongly normalizing. Also, I'm not sure how you even implement an algorithm to verify that a term is EAL-typeable once you add Fix
Perhaps I was thinking it'd be simpler than it actually is. What I had in mind is just treat Fix
like exponential terms, i.e., can't substitute inside them. That'd be sufficient to avoid any way to encode non-termination that I can think of, given that those need to apply something to itself. TBH I quite don't see how that could be harmful, it is just something that can be pattern-matched (queried) infinitely, but that's all. Can you think in a non-terminating term like that? Anyway, I could try figuring that out more precisely... but do you feel like this is a path worth exploring?
I mean, even the untyped fragment of EAL is strongly normalizing (for stratified terms), which I think is the key difference here, as the untyped language of Hindley-Milner isn't. Without being able to substitute inside Fix
, it is just a runtime constant. As such, type-level recursion can't make the language not strongly-normalizing (because it is even without types!).
I'd need to see a more rigorous proof. I suspect that it's not strongly normalizing without types and even with types I'm still not convinced that Fix
can be both safe and useful
But why do you think Fix
not useful? It allows us to express Scott encodings with fix self . (Nat : *) -> (self -> Nat) -> Nat -> Nat
, which is exactly what we needed, no?
Here is a proof of its normalization and complexity bounds (proposition 7, page 19).
So that convinces me that this is possible, but there is still the issue of how to infer the types of expressions. It's still not clear to me that if you skip the type-checking step that the expression is strongly normalizing.
Alternatively, you could skip type inference and just do type-checking of user-supplied type signatures (which I think should be more straight-forward). I think that would be a reasonable approach
Oh, sure, the idea is to store the code with the full types and just type-check it, which I guess is simple. Same as CoC, but making sure you don't duplicate affine variables and don't use them inside boxes. Fast forward when we get used to writing terms with boxes and get the overall feel of it, then we'd probably be able to write tools to auto-annotate Morte terms with them.
Since positive numbers are encoded as a function application:
it is trivial to optimize those expressions on the current hardware as
I think Idris does something similar for
Nat = S Nat | Z
.