ethereum / moon-lang

Minimal code-interchange format
MIT License
193 stars 20 forks source link

Can we optimize lambda (scott) encoding, to work well with todays hardware? #14

Open AnthonyJacob opened 7 years ago

AnthonyJacob commented 7 years ago

Since positive numbers are encoded as a function application:

one f x = f x
two f x = f ( f x )
three f x = f ( f ( f x ) )

it is trivial to optimize those expressions on the current hardware as

nTimes = builtin_vonneumann_apply_n_times
one f x = nTimes 1 f x
two = nTimes 2
three = nTimes 3

I think Idris does something similar for Nat = S Nat | Z.

VictorTaelin commented 7 years ago

Actually Scott-encoding are pretty much as efficient as usual data-structures given a good implementation of closures (reference). I'm currently researching sharing-graph-based implementations, which could open the doors to massively parallel evaluators which, I hope, would make Moon and similar functional languages much faster.

VictorTaelin commented 7 years ago

Oh, I think you're talking about something different - about translating Nat to actual machine ints? I think that is only really possible with types, and even so it is a bit tricky. That is the very reason Moon has Nat as a primitive (and why I claim that type systems would make that not necessary)...

AnthonyJacob commented 7 years ago

Yep, I hoped that we could somehow think out such an algorithm, that would map lambda numbers to closures with integers inside. Then we would not need Number primitive. It would also compress other kinds of expressions that repeatedly apply same function.

Basically moon itself would be only

data MoonTerm
   = App MoonTerm MoonTerm
   | Lam MoonTerm
   | Var Number
   | Ref String
   | Let String MoonTerm MoonTerm
   | Fix String MoonTerm

then there would be second phase that optimizes it further into lower level language

data Term
   = App Term Term
   | Lam Term
   | Var Number
   | Ref String
   | Let String Term Term
   | Fix String Term
   | Pri String [Term]
   | Num Number
   | Str String
   | Flo Double
   | etc
VictorTaelin commented 7 years ago

I think this is more or less my idealized version of Moon/Morte:

data Term
   = App Term Term
   | Lam Term Term
   | All Term Term
   | Let Term Term
   | Fix Term
   | Box Term
   | Var Nat
   | Set Nat

I think all of those constructors are necessary, but no more. App and Lam gives us abstraction (linear functions). All is Pi (i.e., ), Set is the type of types, both together give us an expressive type system. Let and Box gives us the power to duplicate things, which IMO is clearly a separate concept to that of functions, so deserves to be separated. Fix allows us to express recursive types naturally, without making the resulting system inconsistent (because the consistency should probably follow from linear logic)! Var is for substitution. All that is bruijn-indexed. IMO that system seems to have it all:

  1. Simple (more constructors than Morte, but "simpler to reason"; ex: eta-reduction is trivial!)

  2. Has Scott-encodings (a.k.a. the main problem with using Morte right now, IMO)

  3. Can probably express induction, large elims, prove 1 != 0, etc.

  4. Linear types (no garbage collection, also compiling arrayish code to perform in-place writes!)

  5. Consistent (is total, plus a term of type T is an undeniable proof of its proposition)

  6. Var names and comments could be annotated with (a => b => b) "name" term

  7. Could easily compile Nats, Floats, etc. into a low level language as you described

  8. Can be reduced with sharing graphs without an oracle, for theoretical parallel reducers

  9. Compact binary encoding (exactly 3 bits per constructor!)

  10. It doesn't feel arbitrary. Intuitively, each constructor does one thing that is obviously needed

Only reason I didn't start with it is I don't have time to investigate if all of the above actually holds. Intuitively I think it does, but intuition doesn't mean much, I guess. I'd also need to see if it would actually be useable in practice (linear types put a lot of limitations, but I think with proper tooling that'd be fine: remember the language is simpler to reason, so tools could do much more!). Plus it'd take a lot of time to do all that, and I have some immediate problems to solve that need a working Moon on the short term.

@Gabriel439 thoughts?

Gabriella439 commented 7 years ago

When you say "linear logic" and "linear types", are you referring to elementary-affine-typeable or true linear logic? The reason I ask is that I don't see any type-level constructors for explicitly marking inputs shareable (i.e. "Bang", for example)

Also, how do you ensure that this is strongly normalizing?

VictorTaelin commented 7 years ago

@Gabriel439 sorry for my bad namings, Box was supposed to be Bang, so I mean EAL. On normalization, I never looked it deeply, but I assume it is pretty easy to prove this is strongly normalizing given that each duplication consumes a !, right? My point (and where you could disagree, I guess) is that this isn't nearly as restrictive as it seems, as it gives us pretty much bounded for-loops on steroids. In fact, probably much easier to work with than Morte, given that we're actually able to pattern-match now...

Gabriella439 commented 7 years ago

Bounding the number of !s is not by itself enough to prevent normalization. For example, the following lambda expression infinitely loops with only two duplications:

(\x -> x x) (\x -> x x)
VictorTaelin commented 7 years ago

But there is no way to decorate that term with ! (given that a bang'd term can't have affine variables, and each exponential variable must have exactly 1 enclosing !). Ex:

(\x. let !y=x in !(y y)) (\x. let !y=x in !(y y))
let !y=(\x. let !y=x in !(y y)) in !(y y)

Normal form, because (\x. let !y=x in !(y y)) has no !. Another attempt:

(\x. let !y=x in !(y y)) !(\x. let !y=x in !(y y))
let !y=!(\x. let !y=x in !(y y)) in !(y y)
!((\x. let !y=x in !(y y)) (\x. let !y=x in !(y y)))
!(let !y=(\x. let !y=x in !(y y)) in !(y y))

Further, but still on normal form, same reason. Another one:

(\x. let !y=x in (y !y)) !(\x. let !y=x in (y !y))
(let !y=!(\x. let !y=x in (y !y)) in (y !y))
(\x. let !y=x in (y !y)) !(\x. let !y=x in (y !y))

Woops! Infinite loop, but the initial term won't type-check because the first y on (y !y) has 0 enclosing !s.

From what I've heard, it is probably easy to prove this system consistent, the real question is whether this makes up for a "practical programming language for humans" (which I think isn't a good point - we want a great internal/universal format; with good tooling, even brainfuck can be made practical).

Gabriella439 commented 7 years ago

So I know that in a system without Fix an expression that is EAL-typeable is strongly normalizing (since EAL-typeable is a restricted case of Hindley-Milner and Hindley-Milner-typeable expressions are strongly normalizing)

However, I'm still not aware of a proof that if you add unrestricted Fix that an EAL-typeable term is still strongly normalizing. Also, I'm not sure how you even implement an algorithm to verify that a term is EAL-typeable once you add Fix

VictorTaelin commented 7 years ago

Perhaps I was thinking it'd be simpler than it actually is. What I had in mind is just treat Fix like exponential terms, i.e., can't substitute inside them. That'd be sufficient to avoid any way to encode non-termination that I can think of, given that those need to apply something to itself. TBH I quite don't see how that could be harmful, it is just something that can be pattern-matched (queried) infinitely, but that's all. Can you think in a non-terminating term like that? Anyway, I could try figuring that out more precisely... but do you feel like this is a path worth exploring?

VictorTaelin commented 7 years ago

I mean, even the untyped fragment of EAL is strongly normalizing (for stratified terms), which I think is the key difference here, as the untyped language of Hindley-Milner isn't. Without being able to substitute inside Fix, it is just a runtime constant. As such, type-level recursion can't make the language not strongly-normalizing (because it is even without types!).

Gabriella439 commented 7 years ago

I'd need to see a more rigorous proof. I suspect that it's not strongly normalizing without types and even with types I'm still not convinced that Fix can be both safe and useful

VictorTaelin commented 7 years ago

But why do you think Fix not useful? It allows us to express Scott encodings with fix self . (Nat : *) -> (self -> Nat) -> Nat -> Nat, which is exactly what we needed, no?

Here is a proof of its normalization and complexity bounds (proposition 7, page 19).

Gabriella439 commented 7 years ago

So that convinces me that this is possible, but there is still the issue of how to infer the types of expressions. It's still not clear to me that if you skip the type-checking step that the expression is strongly normalizing.

Alternatively, you could skip type inference and just do type-checking of user-supplied type signatures (which I think should be more straight-forward). I think that would be a reasonable approach

VictorTaelin commented 7 years ago

Oh, sure, the idea is to store the code with the full types and just type-check it, which I guess is simple. Same as CoC, but making sure you don't duplicate affine variables and don't use them inside boxes. Fast forward when we get used to writing terms with boxes and get the overall feel of it, then we'd probably be able to write tools to auto-annotate Morte terms with them.