Closed Gabriella439 closed 8 years ago
OK, rewriting that - I'm not exactly against this approach for solving this problem, but I'm somewhat against the need to solve it. Why do you want Morte to produce efficient executables, today, just for curiosity? Lambda-calculus structures are only a constant (3~100x) away from Haskell's performance anyway.
I've always seen Morte as that project on which we would just forget about immediate real-world efficiency and write the algorithms in our platonic language, in the most "natural" way. Higher performance would be achieved by other means, such as different computer architectures and better runtimes... which would, by turn, be possible because of the very simplicity of the core language.
Do you have benchmarks for Caramel? I'm mostly interested in the cost of concatenating and normalizing strings of bytes because that's the one data type and operation I could not get good constant factors on no matter what encoding I used. The benchmark I've been using is concatenating two 1 kB byte strings and then normalizing the result.
Give me a second.
@Gabriel439 I'm not sure I understand what benchmark you mean. Just concatenating two strings with 1000 bytes? I've made this type for bytestrings:
-- Creates a byte string with 1000 identical bytes
n_byte_string n = sample_list
sample_list = (replicate n sample_byte)
sample_byte = (B0 (B1 (B0 (B0 (B1 (B1 (B0 (B0 BEnd))))))))
B0 = (x a b c -> (a (x a b c)))
B1 = (x a b c -> (b (x a b c)))
BEnd = (a b c -> c)
I also used concat:
concat a b cons nil = (a cons (b cons nil))
So, this program:
main = (concat (n_byte_string 1000) (n_byte_string 1000))
Runs instantly on the caramel interpreter. Benchmarking it to further details, I used this Optlam script:
optlam = require("./optlam.js");
lambda = require("./lambda_calculus.js");
var A = lambda.App, L = lambda.Lam, V = lambda.Var, n = lambda.nat, n_ = lambda.nat_;
function n(x){ var t = V(0); for (var i=0; i<x; ++i) t = A(V(1),t); return L(L(t)); };
var n_byte_string = L(L(L(A(A(V(2),A(V(1),L(L(L(A(V(2),A(V(1),A(V(2),A(V(2),A(V(1),A(V(1),A(V(2),A(V(2),V(0)))))))))))))),V(0)))));
var concat = L(L(L(L(A(A(V(3),V(1)),A(A(V(2),V(1)),V(0)))))));
var n1000 = A(n(3), n(10));
var byte_string_1kb = A(n_byte_string, n1000);
var main = A(A(concat, byte_string_1kb), byte_string_1kb);
var norm = optlam.reduce(main);
console.log(lambda.pretty(norm));
console.log(JSON.stringify(optlam.stats));
And I got this result:
{"iterations":1504,"applications":224,"betas":86,"used_memory":1818}
224 applications means the normal form was reached with merely 224 graph rewrites. That is really low. I guess I understand your benchmark wrong?
Edit: in fact, even concatenating two googolbyte strings to normal form take only 10k applications and runs instantly, lol. See this program.
Morte's minimalistic syntax tree is one of its key advantages. I am against cluttering it with non-extensible primitive types.
Did you consider adding them via Embed
?
data Lit
| TextLit Text
| Text -- The type of text literals
| NaturalLit Natural
| Natural -- The type of natural numbers
nordom :: Expr Lit -> Expr Lit
nordom = ... -- This part is where we interpret foreign functions
interpret = normalize . nordom
You can have both Lit
and Path
using Either
.
@MaiaVictor The main reason for the performance speedup there is because the graph rewriting algorithm is exploiting sharing (which is still neat!). However, we can't reasonably exploit sharing in real-world strings that we concatenate. I would like to see concatenation of two random 1 kB strings.
@int-index Yes, using Embed
is also an option. That's a nice idea!
Also, I'm not completely sure that optlam.js
is a correct implementation of Lamping's algorithm (or the lambda calculus). It's the missing the control nodes
Concatenating two completely random 1 kB strings (see it raw) gives actually better results:
{"iterations":213742,"applications":18,"betas":18,"used_memory":414666}
Just 18 graph rewrites and you get the normal form. Yes, Optlam implements only the abstract part of Lamping's algorithm. I say that is enough for anything that matters, but lets not get on that point. You'll get similar performance on Anton's actually correct implementations, and Haskell itself did it instantly too. Even compiling to JavaScript I got the nf instantly, actually. Concatenating in general is really lightweight. In what scenario did it perform bad, exactly?
I think the number of beta-reductions is a misleading metric. The time the code takes to run is more important.
When I run the code it takes approximately 250 ms to run. I benchmarked your program and also benchmarked an empty program so I could subtract the overhead of running node
:
$ git clone https://github.com/MaiaVictor/optlam.git
$ cd optlam.git
$ # create your example concat.js program
$ bench 'node concat.js' # `bench` is from here: https://github.com/Gabriel439/bench
benchmarking node concat.js
time 289.1 ms (273.0 ms .. 307.9 ms)
0.996 R² (0.978 R² .. 1.000 R²)
mean 274.1 ms (250.6 ms .. 283.4 ms)
std dev 16.67 ms (2.112 ms .. 21.39 ms)
variance introduced by outliers: 17% (moderately inflated)
$ touch empty.js
$ bench 'node empty.js'
benchmarking node empty.js
time 34.85 ms (33.01 ms .. 36.87 ms)
0.978 R² (0.938 R² .. 0.997 R²)
mean 34.18 ms (33.20 ms .. 35.89 ms)
std dev 2.669 ms (1.456 ms .. 4.201 ms)
variance introduced by outliers: 30% (moderately inflated)
250 ms is about 10,000,000x-100,000,000x 10,000x-100,000x (my original math was wrong, I forgot to scale by the number of characters) slower than using primitive operations from Haskell for this purpose.
Beta reductions is misleading, I'm not using that. I'm counting all graph rewrite rules. The time the algorithm takes to run is proportional to the number of all graph rewrite rules and nothing else. Optlam is optimal but terribly inefficient, there is no point on measuring its run time. It is a JavaScript interpreter and it runs through the graph sequentially looking for active pairs. On that case, it spends most of the time searching active pairs by walking through the nodes. That is entirely unnecessary, you could just keep a pointer to them when you alloc the memory. Moreover, the readback to λ-calculus is almost always the bottleneck (as it needs to undo all the sharing that takes place!). A specialized interaction net machine wouldn't need such readback.
But this isn't about Optlam at all. Lets just compile to Haskell and do the benchmarks. How do that concat program compare to the same thing using native data structures? That's what we should be measuring. I'm not able to replicate it being much slower than native data structures. I'd like to see a benchmark showing the opposite, though, so I could investigate it further.
So the conclusion I came to was that it's better to keep morte
as is. For high-efficiency primitives, I think it's wiser to fork off a separate language and compiler using morte
as a base template than to reuse morte
directly. I'll explain my reasoning for why.
I spent a lot of time investigating the possibility of extending morte
's syntax tree via the Embed
constructor. The issue that you run into is that you can't easily model foreign types and terms that are polymorphic when you Embed
them this way. For example, there is no way to Embed
a Vector
type constructor this way. You would have to change Embed
to this shape:
data Expr f = ... | Embed (Expr (f (Expr f)))
... which I would like to avoid.
Also, even if you don't want to model a polymorphic FFI and you keep Embed
the way it is you still have to generalize every morte
operation (i.e. parsing/normalization/type-checking/imports) to handle user extensions. I went through the exercise of generalizing morte
like this and it greatly complicates the API and ruins morte
's goal of being a clean and simple lambda calculus implementation.
I also agree with @int-index's sentiment that there are too many primitive types worth supporting to warrant extending morte
's syntax tree.
So I'll keep morte
true to the original spirit of being a minimalist lambda calculus, but I also still plan on spinning off nordom
as a separate compiler that forks morte
's logic because I have a specific application in mind for nordom
that requires high-efficiency primitives. Also, all the derived compilers that I have in mind will be supersets of morte
's syntax, so that any code that doesn't use foreign operations can still be shared with pure morte
code
Seems like using a Combinatory Categorial Grammar (CCG) would be a good fit with morte
/ nordom
and in particular might turn this issue into a matter of what all terms are included in the lexicon. @Gabriel439 and all, thoughts?
http://yoavartzi.com/tutorial/ https://github.com/vlastachu/haskell-ccg
I'm so thankful for that choice. I honestly hope we will eventually figure out natural ways to evaluate λ-calculus programs, exploiting its informational simplicity in a way that we don't do yet. Having those information dense primitives might hinder our ability to fully exploit them.
Alright, so I finally have a concrete roadmap for how to handle high-efficiency operations in
morte
.tl;dr: The main change I'm proposing is to extend
morte
's syntax tree to store primitive literals likeNatural
andText
First, I'll explain how primitive support will be implemented and then I'll explain the other alternatives I considered and why I rejected them.
The goal
I plan to provide a command-line
nordom
tool and library that is almost identical to themorte
command-line compiler.nordom
the command-line tool will be just like themorte
command-line tool plus built-in support for a large standard library of high-efficiency operations.nordom
the library will provide a way to efficiently convert betweenmorte
values and their equivalent Haskell values so thatmorte
can interoperate with Haskell without a separate compile step.Keeping with the Planescape: Torment references, this tool will be named
nordom
because it is "close to the machine" (and Nordom was a character from the plane of Mechanus, albeit "gone rogue" and not truly aligned with the plane).I've already written a prototype of this
nordom
tool with a small standard library and I'm now in the process of cleaning it up, extending the standard library, and making the minimal changes tomorte
necessary to support the tool.For example, using this tool you will be able to write:
... and it will run efficiently. On my machine the above program takes less than 4 milliseconds end-to-end on my machine and most of that is just the overhead of starting up the Haskell runtime.
Changes to
morte
I've identified the minimal changes necessary to make to
morte
in order to support high-efficiency foreign operations that supportNatural
numbers andText
The core syntax tree must be extended with two new constructors for each new primitive data type. For example, if we want to add primitive support for
Text
andNatural
numbers, we would modify the syntax tree like this:... and the lexer/parser/pretty-printer will decode/encode these built-in primitives as human-friendly string literals and numbers
morte
'sload
function would be modified to take aContext
argument:Before:
After:
This
Context
would be used when type-checking importsThose would be the only two changes to
morte
necessary to implementnordom
. Note that we will not be adding primitive functions tomorte
's syntax tree.Note that there might be more than two primitive types to add to
morte
's syntax tree. For example, Haskell'sVector
/ByteString
/Integer
/Scientific
/Char
/Word8
types are all pretty strong contenders for primitive types worth supporting..How
nordom
will workProbably the easiest way to show how
nordom
is designed to work is to paste some code excerpts from mynordom
prototype.First,
nordom
creates aContext
that has the names and types of all supported operations:This context is used to type-check all
nordom
code.Then
nordom
defines aninterpret
function which is identical tomorte
'snormalize
function except with additional cases for each supported function:... and all
nordom
does is type-check code, run it through the interpreter, and then pretty-print it.Alternative 1 - Encode everything in lambda calculus
I pursued this alternative for the longest time and played a lot of tricks with encodings but at the end of the day it's not realistic.
Text
/ByteString
were the worst-behaved of all the primitive types to encode because they were expensive to type-check, normalize, and store in memory.The benchmark I used for how realistic this approach would be was how fast I could concatenate two strings of size N. I could never go faster than 1000 characters per second using pure lambda calculus, which is less than a terminal's window worth of text.
Alternative 2 - Transmit primitive literals out-of-band
One approach I considered was to provide a separate data payload containing all primitive literals. I wasn't very happy with this approach because I wanted to avoid out-of-band information in general.
morte
was partly a reaction to compilers with very complex architectures and distribution models so I preferred to keepmorte
expressions as atomic and self-contained as possible.The more moving parts
morte
code has the more difficult it is to embedmorte
in unusual places. For example, I would like it to be easy to embedmorte
expressions as fields in configuration files or query parameters in API requests and keeping literals compact and in-band greatly simplifies the distribution model.Alternative 3 - Explicitly introduce foreign operations via lambda abstraction
... as opposed to providing them via a global default context.
There are a couple of ways to do this, but the front-runner was encoding foreign operations as a syntax tree encoded in lambda calculus using the Boehm-berarducci encoding.
The thing that killed the viability of Boehm-Berarducci encoding for this purpose was the inability to model polymorphic functions (like
map
) in the embedded foreign language.A secondary issue was that the performance of this encoding degrades for either large syntax trees or a large number of foreign functions. Explicitly passing around all the placeholders for foreign functions around is expensive, for almost the exact same reason that encoding string operations in lambda calculus is expensive. If you squint, the Boehm-Berarducci encoding of a syntax tree with a large number of foreign functions is very similar to the encoding of a string with a large number of possible characters.
Passing around a global context is much easier and more efficient by comparison and requires a very mild extension to the
morte
API: we just generalize the type of theload
function. This also generates much cleaner-looking rawmorte
code since you no longer need to introduce foreign functions via abstraction.Alternative 4 - Add foreign functions to the syntax tree, too
I didn't do this because there was no point. There is no efficiency benefit to adding them to the syntax tree and it needlessly increases the surface area of the
morte
API. The more complex themorte
implementation the more difficult it is to fork or port to other languages and platforms.It's also actively harmful to include foreign functions in the syntax tree because we don't want to standardize the set of foreign functions. We want to be able to support multiple backends and each backend may want to a different set of foreign functions.
Discussion
I'll give a two week discussion period for this in case anybody objects to the above changes or wishes to propose an alternative approach (or revive an approach which I rejected).
For people who approve of this approach, it's also worth discussing which primitive types should and should not be supported. My only guideline is that we should not add types to the syntax tree if they can be reasonably encoded in lambda calculus (like
Maybe
). I pickedByteString
/Scientific
/Char
/Word8
/Vector
as candidates for primitive support because they are expensive to encode otherwise.