Question about optlam - Githubissues

VictorTaelin / optlam

An optimal function evaluator written in JavaScript.

131 stars 8 forks source link

Question about optlam #1

Open chorasimilarity opened 8 years ago

chorasimilarity commented 8 years ago

Is not clear to me if the following reduction falls within your frame. By using only the rewrites from your description I don't think you can duplicate the S combinator. Here is an old attempt to duplicate S only with the rewrites you use. This is relevant if you have a term like (\x.xx)S.

So, the question is: can you reduce, for example, S (K (S I I)) (S (S (K S) K) (K (S I I))) to Y? I can do it in chemlambda but I use more rewrites than what you have on the list (and an algorithm which does not involve walking in a tree).

VictorTaelin commented 8 years ago

Hey, hello again! Thanks for reaching me!

You are correct. Those notes are wrong. I forgot the node labels. It should actually be this:

data Cell = Con Int Port Port Port | Root

Please, reinterpret the image such that each old Con node becomes that new Con node (all with a label 0), and each old Dup node becomes a Con node (with different labels, each). Then the image is correct. Eventually I need to make a better presentation than this terrible, misleading sketch. I'm really sorry.

Nether less, I've tested your terms. Seems like you are able to duplicate S on Optlam. Both your terms work fine. This is how I tested that:

optlam = require("./optlam.js");
lambda = require("./lambda_calculus.js");
var A  = lambda.App,
    L  = lambda.Lam,
    V  = lambda.Var,
    n  = lambda.nat;

// SKI combiantors
var S = L(L(L(A(A(V(2),V(0)),A(V(1),V(0))))));
var K = L(L(V(1)));
var I = L(V(0));

// Your SKI-generated Y-combinator
var Y = A(A(S,A(K,A(A(S,I),I))),A(A(S,A(A(S,A(K,S)),K)),A(K,A(A(S,I),I))));

// To test your Y, this function receives the Y combinator and a Scott-encoded
// list of Church encoded Nats, returns their sum.
var SUMY = L(L(A(A(V(1),L(L(A(A(V(0),L(L(L(L(A(A(V(3),V(1)),A(A(A(V(5),V(2)),V(1)),V(0)))))))),L(L(V(0))))))),V(0))));

// The Scott-encoded list [1, 2, 3]: 
// (λ c n . (c 1 (λ c n . c 2 (λ c n . c 3 (λ c n . n)))))
var L123 = L(L(A(A(V(1),L(L(A(V(1),V(0))))),L(L(A(A(V(1),L(L(A(V(1),A(V(1),V(0)))))),L(L(A(A(V(1),L(L(A(V(1),A(V(1),A(V(1),V(0))))))),L(L(V(0))))))))))));

function test(term){
    var nf = optlam.reduce(term);
    console.log(lambda.pretty(nf));
    console.log("\t"+JSON.stringify(optlam.stats));
};

// (\x.xx) S
test(A(L(A(V(0),V(0))), S));

// Summing the list [1, 2, 3] with your Y.
test(A(A(SUMY, Y), L123));

This program will output

λλλ((1 0) ((2 1) 0))
    {"iterations":61,"applications":15,"betas":7,"used_memory":216}
λλ(1 (1 (1 (1 (1 (1 0))))))
    {"iterations":578,"applications":268,"betas":121,"used_memory":2079}

As expected.

Let me know if this helps you!

VictorTaelin commented 8 years ago

As a side note, do you have any insight in how to implement those nets on actual von neumann machines? I'm having a hard time dealing with one single issue: cache efficiency. Seems like it is key for getting an actually neat performance of those algos, but arbitrary unpredictable allocation screws the heap...

chorasimilarity commented 8 years ago

Thank you for the answer, I look forward to see a more detailed presentation. As for the second question, I have some ideas (expressed with words).

It may because you use De Bruijn notation, which makes beta non local (there is no a priori bound on the size of the graph where you change labels).

It may be because of the labels on the nodes. The trick of chemlambda is that it does with only 5 trivalent nodes, without labels, works for untyped lambda beta (and more), without any brackets. The arrows are not labeled as you do, there is an arbitrary notation (for example you may use a counter) and the only constraint is that, as the graph appears as the list of nodes with the labels on the (ordered: left, right, middle) ports, that any arrow label appears at most twice (yes, and free variables are allowed).

Then, imagine that you want to use two von neumann machines. Easy, split the list of nodes into two parts, freeze the names of the arrows (port labels) which connect nodes from different parts and give a part to machine 1, the other to machine 2. Reduce. Reunite the lists and glue them back. Repeat. (Of course what you need is a clear algorithm concerning the communication of the machines).

As there are many such possible algorithms, there is enough place for experimentation.

Finally, there is another thing to do to speed the computation. I mentioned the nodes of chemlambda for a reason, because now I'm going to argue that among the 5 trivalent nodes, the nodes A(pplication) and L(ambda) are those who are less important, not more important. The other 3 nodes (a FO fanout, a different FOE fanout and a FI fanin) are the important ones. Pick then the language of your choice and express the commands as lambda terms with or without free variables. Then those without free variables replicate in a way and those with free variables in another way, both known. Forget about lambda calculus and use instead the commands as nodes, add the replication rewrites for them, i.e. how they interact with FO and FOE, and add their internal rewrites (i.e. what corresponds to the beta rewrite for the "commands" A and L).

Probably not the best example, but I did this with busy beavers. There are more nodes which appear, one for every internal state and for every tape symbol, one for "move head", that's all.

So maybe the answer lies in something analoguous with what you did with LJSON, with my secret sauce of mol files and FI, FO, FOE nodes and rewrites.

I would be glad to take part in experiments with reducing stuff across the net.

On Wed, Feb 10, 2016 at 2:09 AM, MaiaVictor notifications@github.com wrote:

As a side note, do you have any insight in how to implement those nets on actual von neumann machines? I'm having a hard time dealing with one single issue: cache efficiency. Seems like it is key for getting an actually neat performance of those algos, but arbitrary unpredictable allocation screws the heap...

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-182141310.

VictorTaelin commented 8 years ago

Please, help me understanding what you said.

It may because you use De Bruijn notation, which makes beta non local (there is no a priori bound on the size of the graph where you change labels).

What do you mean? The source language (λ-calculus) uses De Bruijn notation, but the target nets aren't aware they exist - it is just the system on the sketch. And you said it makes beta non local - that's important, but why would that be the case?

The arrows are not labeled as you do

By arrow, you mean the edge between two nodes? That is, the tuple (Target_Node, Target_Port) that I call "Link" on optlam?

In what sense are my arrows labeled? I suppose nodes are labelled, not arrows, right?

Then, imagine that you want to use two von neumann machines. Easy, split the list of nodes into two parts, freeze the names of the arrows (port labels) which connect nodes from different parts and give a part to machine 1, the other to machine 2. Reduce. Reunite the lists and glue them back. Repeat. (Of course what you need is a clear algorithm concerning the communication of the machines).

That looks cool and similar to what I thought once, but how does that scale? What happens when you have thousands of cores? If you just split the graph in several pieces, won't many cores receive small pieces that have no active pairs (i.e., you'll not be using them)? Also, what if you have more cores than nodes?

I'm going to argue that among the 5 trivalent nodes, the nodes A(pplication) and L(ambda) are those who are less important, not more important. The other 3 nodes (a FO fanout, a different FOE fanout and a FI fanin) are the important ones.

Well, I wish I understood chemlambda well enough to grasp what you said. I know you did something amazing, because it works and it doesn't need an infinite alphabet of labels like Optlam. But I simply have no idea what your 5 nodes do and how they do what they do. I've read a lot of your material back them, but most of it read as "well, and this... works, because magic". Do you know how to explain it making parallels to my system?

I'd love to see the stats on the number of reduction rules necessary to reduce some terms (such as 4 4 4 I I) on chemlambda. Why don't you test a few and make a table? :)

chorasimilarity commented 8 years ago

Suppose you take a lambda term T, write it the De Bruijn notation, perform one beta reduction and then rewrite the result in the De Bruijn notation. Let N(T) be the number of variables which change names during this procedure. Then there is no N such that N(T) <= N for any T.
I see in the optlam.js that you use the De Bruijn notation when you encode and decode the lambda term, while the reductions are only at the graph level, therefore the non-locality from 1. does not apply here.
I mean by arrow the oriented edge between two ports, at the graph level. Please tell me if I understand correctly. You keep the graph in the memory array. Each node and each port have an unique id; a memory record keeps the id of a node, its type, the ids of its ports and the ids of the ports which connect with the node's port. The rewrites always imply a pair of nodes, so you have a supplementary field for the name of a rewrite which has this node as the first node. I do the same, but a bit differently. My mol file is a list of nodes, each record being a list where the first item is the type of the node, then the next 3 items (in case there is a 3valent node) are port "names", in specific order according to the type of the node. Two ports are connected if they have the same port name. In my awk script I take a mol file and I produce something equivalent to your memory field. Then I reduce it at the graph level (i.e. by using this fields), but in the first scripts, and also theoretically, all can be done at the level of the mol file notation.
When you encode a lambda term in the De Bruijn notation, then think about the arrows of the graphs as having the variables as labels. It is a bit more complicated than this, because when one adds a dup node (a fanout) then there are two new arrows in the graph, which bear the same label as the input arrow of the dup. This can be turned into a system of giving unique labels to the arrows. This are the labels for arrows I'm talking about. You don't have them because you produce the graph from the lambda term, reduce the graph and finally decode the graph into a lambda term. (Should you stop arbitrarily in the reduction process and decode the graph, can you decode the graph at that point? I guess not, it works only when the graph is reduced to a normal form, and when the initial graph is one obtained from encoding a lambda term. But at the level of graph reducing, there is no need to start with one graph obtained from a lambda term, nor to stop at one which correspond to a lambda term. Unless you travel in the graph, as you do, along the syntactic tree embedded in the graph. In my algorithm I check randomly the arrows.)
The list of the rewrites is here or here , along with the mol file conventions. The algorithm of the reduction is the most dumbest possible, explained also there.
Maybe is not clear what the awk script does, because you ask of a table of the number of rewrites. The output of the awk is script is much more than just the list of the number of rewrites, is the recording of the reduction itself, as a d3.js script. So when you look at the output by using a browser, you can see what happened. The d3.js file is not the reduction, but the record of the reduction done by the awk script.
So, it works like this. The awk script takes a mol file as an input, converts it into some vectors equivalent with your memory one, then reduces the graph (called molecule) randomly, and at each reduction which is performed writes a record for this in the d3.js output.
If I want to see how a lambda term reduces then I transform it manually into a mol file (not a good thing, sorry, wish I have a parser), by first drawing the graph (according to the algorithm from here section 3 (which is for graphic lambda calculus, but applies to chemlambda as well) then giving unique labels to the arrows, then producing the mol file.
I wish I had a theoretical comparison of the efficiency of the reduction, but I don't have enough knowledge for that and the real point is that I am not interested into reducing lambda terms efficiently. In particular cases, like for the Ackermann function for small values, it seems that chemlambda does it faster, in a sense. However, this is not interesting, compared with the fact that in chemlambda (and as well in any graph rewriting system which does not use any supplementary decorations on the graph), because at the graph level the terms and variables appear only as decorations of the arrows (that's what you do when you decode the graph, you produce a decoration of the arrows until you arrive at the decoration of the arrow connected to the root node) the image that the reduction is basically made by betas (rewiring) and evaluations (x:=T) is simply not accurate. There are only rewirings. So for example when you reduce (\x.xx)A in the usual description A is a lambda term, then you do a beta and get(\x.xx)[x:=A] then by some rewrite which is not part of the pure lambda formalism you transform this into AA (and all the discussion about evaluation strategies). At the graph level A is a graph, \x.xx is a 2 nodes graph connected with an arrow of A, the beta is just rewiring and the equivalent of transforming (\x.xx)[x:=A] into AA is the process of rewriting what happens with the graph A when a particular arrow is connected to a fanout node. At the graph level there is no separation of beta-like reduction (from (\x.xx)A to (\x.xx)[x:=A] ) and evaluation strategy (from (\x.xx)[x:=A] to AA in whatever precise sense). When you reduce a graph, say of a lambda term, the you always have, in the middle of the reduction, graphs which are not encodings of lambda terms, and the two parts "pure reduction" and "evaluation" mix. Therefore to make a comparison with reduction algorithms which are not at graph level is like comparing apples with oranges.

On Thu, Feb 11, 2016 at 12:06 AM, MaiaVictor notifications@github.com wrote:

Please, help me understanding what you said.

It may because you use De Bruijn notation, which makes beta non local (there is no a priori bound on the size of the graph where you change labels).

What do you mean? The source language (λ-calculus) uses De Bruijn notation, but the target nets aren't aware they exist - it is just the system on the sketch. And why/in what sense it makes beta non local?

The arrows are not labeled as you do

1.

By arrow, you mean the edge between two nodes? That is, a tuple (Target_Node, Target_Port)? 2.

How are my arrows labeled?

Then, imagine that you want to use two von neumann machines. Easy, split the list of nodes into two parts, freeze the names of the arrows (port labels) which connect nodes from different parts and give a part to machine 1, the other to machine 2. Reduce. Reunite the lists and glue them back. Repeat. (Of course what you need is a clear algorithm concerning the communication of the machines).

That looks cool and similar to what I thought once, but how does that scale? What happens when you have thousands of cores? If you just split the graph in several pieces, won't many cores receive small pieces that have no active pairs (i.e., you'll not be using them)? Also, what if you have more cores than nodes?

I'm going to argue that among the 5 trivalent nodes, the nodes A(pplication) and L(ambda) are those who are less important, not more important. The other 3 nodes (a FO fanout, a different FOE fanout and a FI fanin) are the important ones.

Well, I wish I understood chemlambda to understand what you mean. I know you did something really amazing there because it works and it doesn't need an infinite alphabet of nodes. But I simply have no idea what your 5 nodes do and how they do what they do. I've read a lot of your material back them, but most of it read as "well, and this... works, because magic". Do you know how to explain it making parallels to my system?

I'd love to see the stats on the number of reduction rules necessary to reduce some terms (such as 4 4 4 I I) on chemlambda. Why don't you test a few and make a table? :)

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-182601465.

chorasimilarity commented 8 years ago

Short, concrete things to confirm or refute. decode(encode(T))=T for any lambda term. What about P()=encode(decode())?

Let's call reduce1 a modification of your reduce which takes a memory vector, does one rewrite and outputs the modified memory vector. Call reduceR a hypothetical reduce function which takes a random active pair of nodes and does the reduction, With inputs and outputs memory vectors. Take a lambda term T and let M_0=encode(T), let M_i+1 = reduce1(M_i), M^0=encode(T), M^i+1 = reduceR(M^i).

Then, in general there is no reason why encode(decode()) stops, because decode(M) may not stop for any M
reduce1(M) does not stop for any M but reduceR(M) does stop for any M
In general P(M_i) either it might not stop, or it may stop but is not equal to M_i up to renaming of ids
same for P(M^i) and M^i

Related to the Y from S,K,I example:

What gives your reduction if you use Church encoded, not Scott encoded numbers? I ask this because Scott numbers don't have fanouts, they behave differently than Church numbers

On Thu, Feb 11, 2016 at 11:39 AM, Marius Buliga marius.buliga@gmail.com wrote:

Suppose you take a lambda term T, write it the De Bruijn notation, perform one beta reduction and then rewrite the result in the De Bruijn notation. Let N(T) be the number of variables which change names during this procedure. Then there is no N such that N(T) <= N for any T.

I see in the optlam.js that you use the De Bruijn notation when you encode and decode the lambda term, while the reductions are only at the graph level, therefore the non-locality from 1. does not apply here.

I mean by arrow the oriented edge between two ports, at the graph level. Please tell me if I understand correctly. You keep the graph in the memory array. Each node and each port have an unique id; a memory record keeps the id of a node, its type, the ids of its ports and the ids of the ports which connect with the node's port. The rewrites always imply a pair of nodes, so you have a supplementary field for the name of a rewrite which has this node as the first node. I do the same, but a bit differently. My mol file is a list of nodes, each record being a list where the first item is the type of the node, then the next 3 items (in case there is a 3valent node) are port "names", in specific order according to the type of the node. Two ports are connected if they have the same port name. In my awk script I take a mol file and I produce something equivalent to your memory field. Then I reduce it at the graph level (i.e. by using this fields), but in the first scripts, and also theoretically, all can be done at the level of the mol file notation.

When you encode a lambda term in the De Bruijn notation, then think about the arrows of the graphs as having the variables as labels. It is a bit more complicated than this, because when one adds a dup node (a fanout) then there are two new arrows in the graph, which bear the same label as the input arrow of the dup. This can be turned into a system of giving unique labels to the arrows. This are the labels for arrows I'm talking about. You don't have them because you produce the graph from the lambda term, reduce the graph and finally decode the graph into a lambda term. (Should you stop arbitrarily in the reduction process and decode the graph, can you decode the graph at that point? I guess not, it works only when the graph is reduced to a normal form, and when the initial graph is one obtained from encoding a lambda term. But at the level of graph reducing, there is no need to start with one graph obtained from a lambda term, nor to stop at one which correspond to a lambda term. Unless you travel in the graph, as you do, along the syntactic tree embedded in the graph. In my algorithm I check randomly the arrows.)

The list of the rewrites is here or here , along with the mol file conventions. The algorithm of the reduction is the most dumbest possible, explained also there.

Maybe is not clear what the awk script does, because you ask of a table of the number of rewrites. The output of the awk is script is much more than just the list of the number of rewrites, is the recording of the reduction itself, as a d3.js script. So when you look at the output by using a browser, you can see what happened. The d3.js file is not the reduction, but the record of the reduction done by the awk script.

So, it works like this. The awk script takes a mol file as an input, converts it into some vectors equivalent with your memory one, then reduces the graph (called molecule) randomly, and at each reduction which is performed writes a record for this in the d3.js output.

If I want to see how a lambda term reduces then I transform it manually into a mol file (not a good thing, sorry, wish I have a parser), by first drawing the graph (according to the algorithm from here section 3 (which is for graphic lambda calculus, but applies to chemlambda as well) then giving unique labels to the arrows, then producing the mol file.

I wish I had a theoretical comparison of the efficiency of the reduction, but I don't have enough knowledge for that and the real point is that I am not interested into reducing lambda terms efficiently. In particular cases, like for the Ackermann function for small values, it seems that chemlambda does it faster, in a sense. However, this is not interesting, compared with the fact that in chemlambda (and as well in any graph rewriting system which does not use any supplementary decorations on the graph), because at the graph level the terms and variables appear only as decorations of the arrows (that's what you do when you decode the graph, you produce a decoration of the arrows until you arrive at the decoration of the arrow connected to the root node) the image that the reduction is basically made by betas (rewiring) and evaluations (x:=T) is simply not accurate. There are only rewirings. So for example when you reduce (\x.xx)A in the usual description A is a lambda term, then you do a beta and get(\x.xx)[x:=A] then by some rewrite which is not part of the pure lambda formalism you transform this into AA (and all the discussion about evaluation strategies). At the graph level A is a graph, \x.xx is a 2 nodes graph connected with an arrow of A, the beta is just rewiring and the equivalent of transforming (\x.xx)[x:=A] into AA is the process of rewriting what happens with the graph A when a particular arrow is connected to a fanout node. At the graph level there is no separation of beta-like reduction (from (\x.xx)A to (\x.xx)[x:=A] ) and evaluation strategy (from (\x.xx)[x:=A] to AA in whatever precise sense). When you reduce a graph, say of a lambda term, the you always have, in the middle of the reduction, graphs which are not encodings of lambda terms, and the two parts "pure reduction" and "evaluation" mix. Therefore to make a comparison with reduction algorithms which are not at graph level is like comparing apples with oranges.

On Thu, Feb 11, 2016 at 12:06 AM, MaiaVictor notifications@github.com wrote:

Please, help me understanding what you said.

It may because you use De Bruijn notation, which makes beta non local (there is no a priori bound on the size of the graph where you change labels).

What do you mean? The source language (λ-calculus) uses De Bruijn notation, but the target nets aren't aware they exist - it is just the system on the sketch. And why/in what sense it makes beta non local?

The arrows are not labeled as you do

1.

By arrow, you mean the edge between two nodes? That is, a tuple (Target_Node, Target_Port)? 2.

How are my arrows labeled?

Then, imagine that you want to use two von neumann machines. Easy, split the list of nodes into two parts, freeze the names of the arrows (port labels) which connect nodes from different parts and give a part to machine 1, the other to machine 2. Reduce. Reunite the lists and glue them back. Repeat. (Of course what you need is a clear algorithm concerning the communication of the machines).

That looks cool and similar to what I thought once, but how does that scale? What happens when you have thousands of cores? If you just split the graph in several pieces, won't many cores receive small pieces that have no active pairs (i.e., you'll not be using them)? Also, what if you have more cores than nodes?

I'm going to argue that among the 5 trivalent nodes, the nodes A(pplication) and L(ambda) are those who are less important, not more important. The other 3 nodes (a FO fanout, a different FOE fanout and a FI fanin) are the important ones.

Well, I wish I understood chemlambda to understand what you mean. I know you did something really amazing there because it works and it doesn't need an infinite alphabet of nodes. But I simply have no idea what your 5 nodes do and how they do what they do. I've read a lot of your material back them, but most of it read as "well, and this... works, because magic". Do you know how to explain it making parallels to my system?

I'd love to see the stats on the number of reduction rules necessary to reduce some terms (such as 4 4 4 I I) on chemlambda. Why don't you test a few and make a table? :)

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-182601465.

VictorTaelin commented 8 years ago

Suppose you take a lambda term T, write it the De Bruijn notation, perform one beta reduction and then rewrite the result in the De Bruijn notation. Let N(T) be the number of variables which change names during this procedure. Then there is no N such that N(T) <= N for any T.

Sorry, I don't understand what you are saying. Why you are comparing a function (N) to a number (N(T))?

I see in the optlam.js that you use the De Bruijn notation when you encode and decode the lambda term, while the reductions are only at the graph level, therefore the non-locality from 1. does not apply here.

Yes, that is what I meant, De Bruijn is used for convenience, has no effect on the resulting graph.

Each node and each port have an unique id;

A port has no id, it just has the id of the target node, and the number of the target port.

I do the same, but a bit differently. My mol file is a list of nodes, each record being a list where the first item is the type of the node, then the next 3 items (in case there is a 3valent node) are port "names", in specific order according to the type of the node. Two ports are connected if they have the same port name. In my awk script I take a mol file and I produce something equivalent to your memory field. Then I reduce it at the graph level (i.e. by using this fields), but in the first scripts, and also theoretically, all can be done at the level of the mol file notation.

I think I understand what you mean. I prefer your way of doing it. Very interesting.

When you encode a lambda term in the De Bruijn notation, then think about the arrows of the graphs as having the variables as labels. It is a bit more complicated than this, because when one adds a dup node (a fanout) then there are two new arrows in the graph, which bear the same label as the input arrow of the dup. This can be turned into a system of giving unique labels to the arrows. This are the labels for arrows I'm talking about. You don't have them because you produce the graph from the lambda term, reduce the graph and finally decode the graph into a lambda term. (Should you stop arbitrarily in the reduction process and decode the graph, can you decode the graph at that point? I guess not, it works only when the graph is reduced to a normal form, and when the initial graph is one obtained from encoding a lambda term. But at the level of graph reducing, there is no need to start with one graph obtained from a lambda term, nor to stop at one which correspond to a lambda term. Unless you travel in the graph, as you do, along the syntactic tree embedded in the graph. In my algorithm I check randomly the arrows.)

I read this half a dozen times, but I'm having a lot of trouble understanding this part. I do not make the connection between arrows and variables. Just to double check you understand optlam (you probably do): the reason I travel through the graph is for skipping unreached branches. I could reduce random active ports and the result would be the same. My system is exactly the same as Symmetric Interaction Combinators, except I have infinite trivalent node types, not just 2. The graph reducing algorithm isn't aware of the lambda calculus at all, it is merely a convenient syntax for writing/understanding some graphs, but there are interesting graphs which can't be represented on it.

The list of the rewrites is here or here , along with the mol file conventions. The algorithm of the reduction is the most dumbest possible, explained also there.

I am buried with work to do, but it is on the top of my list of priorities to research your work until I finally understand it satisfactorily.

Maybe is not clear what the awk script does, because you ask of a table of the number of rewrites. The output of the awk is script is much more than just the list of the number of rewrites, is the recording of the reduction itself, as a d3.js script. So when you look at the output by using a browser, you can see what happened. The d3.js file is not the reduction, but the record of the reduction done by the awk script.

I'm not even aware of the script! Again, it is high on my priority list to research your work better, but at this point I have only spent a few hours reading your papers and texts (and that was a while ago), I haven't tested it yet.

So, it works like this. The awk script takes a mol file as an input, converts it into some vectors equivalent with your memory one, then reduces the graph (called molecule) randomly, and at each reduction which is performed writes a record for this in the d3.js output.

This makes perfect sense. I think I do understand how your algorithm operates mechanically (the part I don't understand yet is why the rules are what they are, why they can compute lambda terms, etc).

I wish I had a theoretical comparison of the efficiency of the reduction, but I don't have enough knowledge for that and the real point is that I am not interested into reducing lambda terms efficiently. In particular cases, like for the Ackermann function for small values, it seems that chemlambda does it faster, in a sense. However, this is not interesting, compared with the fact that in chemlambda (and as well in any graph rewriting system which does not use any supplementary decorations on the graph), because at the graph level the terms and variables appear only as decorations of the arrows (that's what you do when you decode the graph, you produce a decoration of the arrows until you arrive at the decoration of the arrow connected to the root node) the image that the reduction is basically made by betas (rewiring) and evaluations (x:=T) is simply not accurate. There are only rewirings. So for example when you reduce (\x.xx)A in the usual description A is a lambda term, then you do a beta and get(\x.xx)[x:=A] then by some rewrite which is not part of the pure lambda formalism you transform this into AA (and all the discussion about evaluation strategies). At the graph level A is a graph, \x.xx is a 2 nodes graph connected with an arrow of A, the beta is just rewiring and the equivalent of transforming (\x.xx)[x:=A] into AA is the process of rewriting what happens with the graph A when a particular arrow is connected to a fanout node. At the graph level there is no separation of beta-like reduction (from (\x.xx)A to (\x.xx)[x:=A] ) and evaluation strategy (from (\x.xx)[x:=A] to AA in whatever precise sense). When you reduce a graph, say of a lambda term, the you always have, in the middle of the reduction, graphs which are not encodings of lambda terms, and the two parts "pure reduction" and "evaluation" mix. Therefore to make a comparison with reduction algorithms which are not at graph level is like comparing apples with oranges.

I know all that :) I didn't ask a comparison with conventional algorithms, I asked a comparison with interaction net based algorithms. For example. on the end of the YALE: yet another lambda evaluator based on interaction nets paper, there is this table comparing the number of reduction rules necessary to reduce some lambda terms (at the graph level). This is IMO a very solid way to compare the efficiency of those kinds of systems. Your system is different, though, in that λ-calculus is not a main concern. But some kind of comparison of how much reductions it takes to evaluate similar algorithms (on that case, exponentiation) on chemlambda would be very cool.

Then, in general there is no reason why encode(decode()) stops, because decode(M) may not stop for any M

That sounds right, decode(M) may not stop for any M as you can craft a vicious cycle on the resulting graph, on which the reducer will keep looping without ever reaching an active pair (I think).

reduce1(M) does not stop for any M but reduceR(M) does stop for any M

That looks correct.

In general P(M_i) either it might not stop, or it may stop but is not equal to M_i up to renaming of ids

same for P(M^i) and M^i

If you mean P(M_i) can be different than M_i, that is right. A simple example, if you reduce 2 2, it will produce a compact representation of the church number 4. If you decode and encode it again, it will produce a bigger representation of it. Not always, though. If you encode 2, decode it, and encode again, you get the same graph.

What gives your reduction if you use Church encoded, not Scott encoded numbers? I ask this because Scott numbers don't have fanouts, they behave differently than Church numbers

I used church encoded numbers. The scott encoding I used was for the list containing those numbers. I did so because you don't even need the Y combinator to encode sum if your list is church-encoded. The Y combinator (or similar strategies of producing non-strongly normalizing terms) are only required to convert data from the scott to the church encoding.

I hope this could be useful. Why those questions, though? Let me know what you are doing!

chorasimilarity commented 8 years ago

Thank you for the precise answers! You do clear, great things, I am doing opportunistic programs to show that something works, so it's not elegant like what you do, only perhaps in principle.

Nothing more to say, for the moment, other than please read "then there no natural number M such that N(T) <= M for any T" instead of "Then there is no N such that N(T) <= N for any T"

On Fri, Feb 12, 2016 at 9:01 PM, MaiaVictor notifications@github.com wrote:

Suppose you take a lambda term T, write it the De Bruijn notation,

perform one beta reduction and then rewrite the result in the De Bruijn notation. Let N(T) be the number of variables which change names during this procedure. Then there is no N such that N(T) <= N for any T.

Sorry, I don't understand what you are saying. Why you are comparing a function (N) to a number (N(T))?

I see in the optlam.js that you use the De Bruijn notation when you

encode and decode the lambda term, while the reductions are only at the graph level, therefore the non-locality from 1. does not apply here.

Yes, that is what I meant, De Bruijn is used for convenience, has no effect on the resulting graph.

Each node and each port have an unique id;

A port has no id, it just has the id of the target node, and the number of the target port.

I do the same, but a bit differently. My mol file

is a list of nodes, each record being a list where the first item is the type of the node, then the next 3 items (in case there is a 3valent node) are port "names", in specific order according to the type of the node. Two ports are connected if they have the same port name. In my awk script I take a mol file and I produce something equivalent to your memory field. Then I reduce it at the graph level (i.e. by using this fields), but in the first scripts, and also theoretically, all can be done at the level of the mol file notation.

I think I understand what you mean. I prefer your way of doing it. Very interesting.

When you encode a lambda term in the De Bruijn notation, then think

about the arrows of the graphs as having the variables as labels. It is a bit more complicated than this, because when one adds a dup node (a fanout) then there are two new arrows in the graph, which bear the same label as the input arrow of the dup. This can be turned into a system of giving unique labels to the arrows. This are the labels for arrows I'm talking about. You don't have them because you produce the graph from the lambda term, reduce the graph and finally decode the graph into a lambda term. (Should you stop arbitrarily in the reduction process and decode the graph, can you decode the graph at that point? I guess not, it works only when the graph is reduced to a normal form, and when the initial graph is one obtained from encoding a lambda term. But at the level of graph reducing, there is no need to start with one graph obtained from a lambda term, nor to stop at one which correspond to a lambda term. Unless you travel in the graph, as you do, along the syntactic tree embedded in the graph. In my algorithm I check randomly the arrows.)

I read this half a dozen times, but I'm having a lot of trouble understanding this part. I do not make the connection between arrows and variables. Just to double check you understand optlam (you probably did): the reason I travel through the graph is for skipping unreached branches. I could reduce random active ports and the result would be the same. My system is exactly the same as Symmetric Interaction Combinators, except I have infinite trivalent node types, not just 2. The graph reducing algorithm isn't aware of the lambda calculus at all, it is merely a convenient syntax for writing/understanding some graphs, but there are interesting graphs which can't be represented on it.

The list of the rewrites is [here](

http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html) or here http://chorasimilarity.github.io/chemlambda-gui/dynamic/molecular.html , along with the mol file conventions. The algorithm of the reduction is the most dumbest possible, explained also there.

I am full of work to do, but it is on the top of my list of priorities to research your work until I finally understand it satisfactorily.

Maybe is not clear what the awk script does, because you ask of a table

of the number of rewrites. The output of the awk is script is much more than just the list of the number of rewrites, is the recording of the reduction itself, as a d3.js script. So when you look at the output by using a browser, you can see what happened. The d3.js file is not the reduction, but the record of the reduction done by the awk script.

I'm not even aware of the script! Again, it is high on my priority list to research your work better, but at this point I have only spent a few hours reading your papers and texts (and that was a while ago), I haven't tested it yet.

So, it works like this. The awk script takes a mol file as an input,

converts it into some vectors equivalent with your memory one, then reduces the graph (called molecule) randomly, and at each reduction which is performed writes a record for this in the d3.js output.

This makes perfect sense. I think I do understand how your algorithm operates mechanically (the part I don't understand yet is why the rules are what they are, why they can compute lambda terms, etc).

I wish I had a theoretical comparison of the efficiency of the reduction, but I don't have enough knowledge for that and the real point is that I am not interested into reducing lambda terms efficiently. In particular cases, like for the Ackermann function for small values, it seems that chemlambda does it faster, in a sense. However, this is not interesting, compared with the fact that in chemlambda (and as well in any graph rewriting system which does not use any supplementary decorations on the graph), because at the graph level the terms and variables appear only as decorations of the arrows (that's what you do when you decode the graph, you produce a decoration of the arrows until you arrive at the decoration of the arrow connected to the root node) the image that the reduction is basically made by betas (rewiring) and evaluations (x:=T) is simply not accurate. There are only rewirings. So for example when you reduce (\x.xx)A in the usual description A is a lambda term, then you do a beta and get(\x.xx)[x:=A] then by some rewrite which is not part of the pure lambda formalism you transform this into AA (and all the discussion about evaluation strategies). At the graph level A is a graph, \x.xx is a 2 nodes graph connected with an arrow of A, the beta is just rewiring and the equivalent of transforming (\x.xx)[x:=A] into AA is the process of rewriting what happens with the graph A when a particular arrow is connected to a fanout node. At the graph level there is no separation of beta-like reduction (from (\x.xx)A to (\x.xx)[x:=A] ) and evaluation strategy (from (\x.xx)[x:=A] to AA in whatever precise sense). When you reduce a graph, say of a lambda term, the you always have, in the middle of the reduction, graphs which are not encodings of lambda terms, and the two parts "pure reduction" and "evaluation" mix. Therefore to make a comparison with reduction algorithms which are not at graph level is like comparing apples with oranges.

I now all that :) I didn't ask a comparison with conventional algorithms, I asked a comparison with interaction net based algorithms. For example. on the end of the YALE: yet another lambda evaluator based on interaction nets paper, there is this table http://i.imgur.com/JJmefmu.png comparing the number of reduction rules necessary to reduce some lambda terms (at the graph level). This is IMO a very solid way to compare the efficiency of those kinds of systems. Your system is different, though, in that λ-calculus is not a main concern. But some kind of comparison of how much reductions it takes to evaluate similar algorithms (on that case, exponentiation) on chemlambda would be very cool.

Then, in general there is no reason why encode(decode()) stops, because

decode(M) may not stop for any M

That sounds right, decode(M) may not stop for any M as you can craft a vicious cycle on the resulting graph, on which the reducer will keep looping without ever reaching an active pair (I think).

reduce1(M) does not stop for any M but reduceR(M) does stop for any M

That looks correct.

In general P(M_i) either it might not stop, or it may stop but is not

equal to M_i up to renaming of ids

same for P(M^i) and M^i

If you mean P(M_i) can be different than M_i, that is right. A simple example, if you reduce 2 2, it will produce a compact representation of the church number 4. If you decode and encode it again, it will produce a bigger representation of it. Not always, though. If you encode 2, decode it, and encode again, you get the same graph.

What gives your reduction if you use Church encoded, not Scott encoded

numbers? I ask this because Scott numbers don't have fanouts, they behave differently than Church numbers

I used church encoded numbers. The scott encoding I used was for the list containing those numbers. I did so because you don't even need the Y combinator to encode sum if your list is church-encoded. The Y combinator (or similar strategies of producing non-strongly normalizing terms) are only required to convert data from the scott to the church encoding.

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-183451350.

VictorTaelin commented 8 years ago

Ah, I see. I don't quite understand it, though - if you do a beta reduction in a term, it is a whole new term, how could you compare the old variables with the new ones? Some variables could be replicated, for example. Not sure what this means... :(

Thank you for the precise answers! You do clear, great things, I am doing opportunistic programs to show that something works, so it's not elegant like what you do, only perhaps in principle.

That's the opposite of my impression, seems like you made some really neat elegant discoveries that I really need to understand, while I'm just trying to grasp a lot of things that are still a lot over my reach Thank you very much for all the awesome work. Please let me know when I can help in any way!

chorasimilarity commented 8 years ago

Re the non locality of the beta reduction with De Bruijn indices, I can't find the best reference at this moment, but to get an image, see, for example, at p. 3-4 of Abadi, Cardelli, Curien, Levy, Explicit substitutions https://hal.archives-ouvertes.fr/file/index/docid/75382/filename/RR-1176.pdf

This is a very interesting discussion, I would be glad to continue it and do something with it. And yes, you can help a lot! but it's entirely up to you, if you think this is something you want to do, then I'd be very grateful.

What about making this chemlambda scripts entirely (and better and much shorter) js scripts? Using this for playing, educational (with a parser from lambda terms to graphs). Making this more flexible (one reduction mode, a limited rewind, seeing labels on nodes and edges if wanted)? Anything creative with it!

We need to lay down some common ground, notations, assumptions, etc, for the discussion. The reason I asked about encode(decode()) and decode(encode()) is the following. There are two realms: lambda terms and graphs associated to them. As functions in the mathematical sense:

encode: lambda terms --> graphs decode: subset of graphs --> lambda terms

They are not inverse one to another, only partially inverse: decode(encode())=id_(lambda terms). So encode is an injection and decode is a partially defined projection.

The reduction functions reduce1 and reduceR also, behave differently: reduce1 is defined (i.e. it stops) only for graphs which have a certain structure, at least we know that reduce1 stops for graphs in the image of encode, so reduce1(encode()) makes sense as a function (it stops as an algorithm). But reduce1 is not preserving the class of graphs encode(lambda terms). The reason is simple: the rewrite you call "commutativity" and I call DIST(ributivity) does it.

The function reduceR is defined on the whole class of graphs.

All this means that lambda calculus is just a shadow of a graph rewriting calculus. It's futile to restrict to meaningful (i.e. in the domain of decode) graphs, I believe many problems come from this unnecessary constraint.

On Sat, Feb 13, 2016 at 6:32 AM, MaiaVictor notifications@github.com wrote:

Ah, I see. I don't quite understand it, though - if you do a beta reduction in a term, it is a whole new term, how could you compare the old variables with the new ones? Some variables could be replicated, for example. Not sure what this means... :(

Thank you for the precise answers! You do clear, great things, I am doing opportunistic programs to show that something works, so it's not elegant like what you do, only perhaps in principle.

That's the opposite of my impression, seems like you made some really neat elegant discoveries that I really need to understand, while I'm just trying to grasp a lot of things that are still a lot over my reach Thank you very much for all the awesome work. Please let me know when I can help in any way!

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-183587861.

chorasimilarity commented 8 years ago

I see your https://github.com/MaiaVictor/parallel_lambda_computer_tests and I want to ask what you think about treating nodes as datagrams. I know that's vague. I looked at the Accidentally Turing-Complete page and I found this http://vanbever.eu/pdfs/vanbever_turing_icnp_2013.pdf

On Sat, Feb 13, 2016 at 12:44 PM, Marius Buliga marius.buliga@gmail.com wrote:

Re the non locality of the beta reduction with De Bruijn indices, I can't find the best reference at this moment, but to get an image, see, for example, at p. 3-4 of Abadi, Cardelli, Curien, Levy, Explicit substitutions https://hal.archives-ouvertes.fr/file/index/docid/75382/filename/RR-1176.pdf

This is a very interesting discussion, I would be glad to continue it and do something with it. And yes, you can help a lot! but it's entirely up to you, if you think this is something you want to do, then I'd be very grateful.

What about making this chemlambda scripts entirely (and better and much shorter) js scripts? Using this for playing, educational (with a parser from lambda terms to graphs). Making this more flexible (one reduction mode, a limited rewind, seeing labels on nodes and edges if wanted)? Anything creative with it!

We need to lay down some common ground, notations, assumptions, etc, for the discussion. The reason I asked about encode(decode()) and decode(encode()) is the following. There are two realms: lambda terms and graphs associated to them. As functions in the mathematical sense:

encode: lambda terms --> graphs decode: subset of graphs --> lambda terms

They are not inverse one to another, only partially inverse: decode(encode())=id_(lambda terms). So encode is an injection and decode is a partially defined projection.

The reduction functions reduce1 and reduceR also, behave differently: reduce1 is defined (i.e. it stops) only for graphs which have a certain structure, at least we know that reduce1 stops for graphs in the image of encode, so reduce1(encode()) makes sense as a function (it stops as an algorithm). But reduce1 is not preserving the class of graphs encode(lambda terms). The reason is simple: the rewrite you call "commutativity" and I call DIST(ributivity) does it.

The function reduceR is defined on the whole class of graphs.

All this means that lambda calculus is just a shadow of a graph rewriting calculus. It's futile to restrict to meaningful (i.e. in the domain of decode) graphs, I believe many problems come from this unnecessary constraint.

On Sat, Feb 13, 2016 at 6:32 AM, MaiaVictor notifications@github.com wrote:

Ah, I see. I don't quite understand it, though - if you do a beta reduction in a term, it is a whole new term, how could you compare the old variables with the new ones? Some variables could be replicated, for example. Not sure what this means... :(

Thank you for the precise answers! You do clear, great things, I am doing opportunistic programs to show that something works, so it's not elegant like what you do, only perhaps in principle.

That's the opposite of my impression, seems like you made some really neat elegant discoveries that I really need to understand, while I'm just trying to grasp a lot of things that are still a lot over my reach Thank you very much for all the awesome work. Please let me know when I can help in any way!

— Reply to this email directly or view it on GitHub https://github.com/MaiaVictor/optlam/issues/1#issuecomment-183587861.