chorasimilarity / chemlambda-gui

Life like molecular computers with artificial chemistry.
https://chemlambda.github.io/index.html
GNU General Public License v2.0
134 stars 14 forks source link

I have many questions about Chemlambda. #2

Closed VictorTaelin closed 3 years ago

VictorTaelin commented 8 years ago

Hello. I just heard about chemlambda after coming from a λ-calculus experience. I have many questions. Please only answer them if you have the time to.

  1. How does Chemlambda relate to optimal lambda calculus evaluators and GLC (is it a progression or just something different)?
  2. Can Chemlambda be used to evaluate lambda calculus terms efficiently in conventional computers? How does that efficiency relate to interaction nets?
  3. How did you get at this design?
  4. Would you say its design is perfect and won't ever change, or do you think further changes can make the system better (maybe for some cases)?
  5. Would you argue that Chemlambda is superior as a computational model in relation to λ-calculus, rewrite systems, automatas and so on?
  6. What cool things did you manage to do with Chemlambda that the λ-calculus isn't capable of, and that you haven't published yet?
  7. Chemlambda is obviously a lot about chemistry, yet the model is so simple I could see it being applied in other means. Do you see Chemlambda being used to implement other different kinds of computers, for example, light based?

Sorry for so many questions. Thanks in advance.

chorasimilarity commented 8 years ago

Thank you for these good questions, I hope the answer will be useful for anybody reading it. I start with a short description of what is specific to chemlambda in order to have a context for the more to the point answers.

Chemlambda is a graph rewrite system with an algorithm for using the rewrites. All rewrite rules of chemlambda have the form: if condition C then replace this pattern Left by this pattern Right. (Most rewrites do not have a condition C to verify.) All rewrites are local; a local graph rewrite is by definition one where there exists a natural N such that for any input graph, once a pattern Left is identified, the number of (nodes and bonds of the graph which have to be taken into consideration in order to verify the condition C) plus (the number of bonds and nodes of Left and Right patterns) is less than N. As an example of a local graph rewrite think about the Wadsworth-Lamping graphical version of the beta reduction; a non local graph rewrite is, for example a fan-out rewrite which transforms an arbitrarily large syntactic tree (say) with the root connected to a fanout node into two copies of it, with the fanout node erased.

The second important thing about chemlambda is that everything is a graph (or a graph rewrite). There are no variables dangling at the possible leaves of the graph. There is no evaluation of variables (by substitution or any other means) since there are no variables. The graph is not seen as representing the flow of some computation, starting from the leaves, passing through the bonds as if they are wires and through the nodes as if they are gates, until eventually the result gets out through the root leaf. Instead, because everything is a graph, the computation is done by the graph rewrites.

Concretely, but before explaining the relations between chemlambda and lambda calculus, compare with the de Bruijn notation, which allows to formulate lambda calculus as a rewrite system without variables as well. However, with de Bruijn notation the beta rewrite becomes non local, because after we perform it, there is a second step consisting into renumbering the variables, and this second step can be understood as a sequence of rewrites which are not local according to the definition. (This is true because the number used instead of a variable comes from the number of lambda nodes which have to be traversed until we hit the lambda node which issued the variable, or this is a quantity which can't be bounded a priori.) Chemlambda manages to have both properties: locality of rewrites, no variables.

The third thing is that chemlambda uses the most simple algorithm for the application of rewrites: do, randomly or deterministically, as many rewrites which are possible and, if there is conflict (when two patterns Left and Left' overlap) then use a priority rule. This algorithm, which I call "stupid", has the following chemical interpretation: the graph is like a chemical molecule. There exist invisible enzymes around, which hit randomly (or deterministically) this molecule, each enzyme managing a rewrite rule. When such an enzyme finds a Left pattern, if C is verified then the enzyme bonds to the Left pattern, disassembles it and replaces it by the Right pattern, then goes away. In the formalism there is no care about where does the enzyme finds the elements of the Right pattern, or where the elements of the Left pattern go. Of course that this leaves open possible completions of chemlambda where parts of this mechanism are explicited, for example in such a way so to have a conservation of number of nodes (aka garbage collection) via prescribed interactions between different enzymes. What is left in the formalism is that the molecules appear as data and the enzymes are the machines.

Now the answers to the very good questions.

  1. and 2. There is a simple procedure to associate a chemlambda graph (aka a molecule) to a lambda term. Chemlambda has trivalent nodes A and L which can be used for building the syntactic tree of the lambda term (A for application, L for lambda abstraction). Then it is only left to eliminate the variables. The variables under a lambda are replaced by trees of FO (fanout) nodes with the root at the lambda node and the leaves at the A nodes where they occur. The variables under a lambda which don't occur anywhere are replaced by a bond and a T (terminal) node. All the other variables are replaced by trees of FO nodes with a free root (there is a node called FRIN for that). The root of the syntactic tree is capped by a FROUT node. There is a simple notation for molecules, which apply to these particular molecules as well. Take one molecule and give arbitrary, but different names to the bonds. Then write down a list of nodes, where each node is described by its type (can be L, A, FO, FI, FOE, Arrow, T, FRIN, FROUT and more, see further) and a list of names of bonds which connect to the node, in a specific order. Technically each node type has its ports of certain type as well, and the list of names of bonds enumerates them in a prescribed order of ports types. The ports types are pairs of types "in" or "out" and "left" or "right" or "middle". See the page of moves for details.

The list obtained is a mol file, many examples are in the repository. Because any bond connects two nodes, it means that every name of bond appears in the list exactly twice. We may relax this by eliminating from the list the FRIN and FROUT nodes, because they can be added afterwards, whenever there is a bond name which appears only once. The advantage is that if we splice a mol file into two disjoint parts, then we obtain two mol files.

Now we can compare with interaction nets, for example. To a lambda term corresponds a molecule, or a mol file. There are no variables, because the only variables would be the names of bonds, which are used only in relation with the rewrites. Some rewrites are common, for example the beta rewrite and some of the DIST rewrites. In chemlambda there are two fanout nodes, FO and FOE, which enter in different rewrites. Also, there is no concern, in chemlambda, about the correctness of a molecule, because this would be a global property. Therefore, even the rewrites which are common with the interaction nets can be applied everywhere it is possible. The computation is a cascade of rewrites with the beta rewrite for the reduction and with the other rewrites (especially the DIST family) as a replacement for the passing or sharing of variables. This has several effects. One is that even if we start from a molecule obtained from a lambda term, there are moments when the rewritten molecule is not one which can be obtained from a lambda term by the mentioned procedure. Indeed, that's because some parts of it, representing variables or sub terms, are in a process of duplication, but not yet duplicated. Secondly, the duplication (by DIST, say) and the reduction (by beta) mix, in chemlambda, instead of being separated as it happens in all other formalisms I know about. Thirdly, there is no constraint to work only with molecules which are translations of lambda terms. One can have molecules without any FROUT node or with many FROUT nodes, for example.

It would be very interesting, in my opinion, to seriously look at chemlambda in relation to optimal lambda calculus evaluators though, exactly because of this mixing of duplication and reduction.

  1. Chemlambda is a modification of graphic lambda calculus, which is a graph rewrite formalism (but without any algorithm of application) which combines lambda calculus with emergent algebras. Have to say here (and it applies to 1, 2 as well) that by lambda calculus I mean untyped lambda beta calculus. The eta reduction rule appears as a non local graph rewrite, therefore is not part of the formalism, with various subtle implications.

Emergent algebras are coming from metric geometry. I noticed that big proofs from a field called sub-riemannian geometry can be reduced to an abstract nonsense about rewriting of graphs with nodes representing a generalization of a notion of dilation (say that they are approximate self-similarities). What was puzzling to me was that the space which was under study was irrelevant, only the rewrites mattered. Therefore I had as a goal to formulate these rewrites as a computation with no variables, so I mixed emergent algebras with lambda calculus.

  1. and 5. Everything is subject to change. Only the basic principles matter. Many changes are possible. I made some of them, for example starting with the "quiner" named scripts, there are new nodes and rewrites for Turing machines (I used a busy beaver). This shows that the ideas from chemlambda are not strictly related to lambda calculus. There is a common core, made of the nodes FI, FO and FOE, and rewrites FI-FOE, FI-FO, FO-FOE, and then there are separate domains. Lambda calculus like are A and L with the beta (i.e. A-L) and the DIST rewrites for A and L. For the busy beaver there are the Turing rewrites and the PROP rewrites (analogous to DIS, but for 2-valent nodes). See this explanation of the busy beaver implementation in chemlambda. Now we can mix lambda calculus like things with TM like things, for example we can apply a Church number to a busy beaver.

Ideally, we may take any program and we may create nodes and rewrites such that we may eliminate all variables. For untyped lambda beta, is the pure chemlambda, for TM is the enhanced chemlambda. This shows that at any level of sophistication, from the high level of a functional programming style, to the low, metal level of a chip, this can be done.

Which is interesting because, recall, a subset of a mol file is a mol file, and there is nothing globally meaningful, as a variable. There is no evidence to gather externally, everything is distributed over the graphs.

There should be many interesting things to try, from changing the algorithm, to distributed computations with chemlambda, combined to any variant of managing the bonds between different actors (if we split a mol file into two, each part will have bond names which appear only once, so there has to be a system to connect these free bonds, but the system is as indifferent to chemlambda as is lambda calculus or Turing machines). I don't know, maybe IPFS ?

The implementations of the stupid algorithm can be made much much better than what I was able to come with. I use awk because it's everywhere and it has associative arrays, but otherwise why not a direct javascript version, with garbage collection managed with object pools, why not make the enzymes machines really count? Otherwise, there is an almost finished chemlambda-py but the beta rewrite is badly implemented.

I am very much willing to take part at such improvements, if there is interest!

  1. For example chemlambda quines. The first one I found was hidden inside the molecule for the predecessor, story told here. In lambda calculus, the only term which would deserve to be called a quine is Omega, but in chemlambda a quine is any molecule which would have a deterministic periodic evolution. There are many of those and they satisfy all the basic requirements for being called alive. That is very serious for me, because the most stupid algorithm with well chosen local rewrites give perhaps the first proof of principle that we can compute like Nature does it: locally, asynchronously, (globally) meaningless.

I have many examples but, because you mention publishing, I have to say that I am looking for better ways than legacy publishing for sharing these. The main purpose of publishing is to communicate, so the ideas spread and mix, have kids and therefore survive. With a system like chemlambda, it is hard to believe that it gives interesting results without seeing the results, then it is hard to attach to the publication enough data to make it self-validating. That is why I am in favour of a self-validating means of publication, see my first try Molecular computers, I project a short movie done entirely with chemlambda scripts about a whole artificial living cell first trailer and why not a game?

On Sun, Sep 27, 2015 at 5:17 AM, SrVictorMaia notifications@github.com wrote:

Hello. I just heard about chemlambda after coming from a λ-calculus experience. I have many questions. Please only answer them if you have the time to.

1.

How does Chemlambda relate to optimal lambda calculus evaluators? 2.

Can Chemlambda be used to evaluate lambda calculus terms efficiently in conventional computers? How does that efficiency relate to interaction nets? 3.

How did you get at this design? 4.

Would you say its design is perfect and won't ever change, or do you think further changes can make the system better (maybe for some cases)? 5.

Would you argue that Chemlambda is superior as a computational model in relation to λ-calculus, rewrite systems, automatas and so on? 6.

What cool things did you manage to do with Chemlambda that the λ-calculus isn't capable of, and that you haven't published yet? 7.

Chemlambda is obviously a lot about chemistry, yet the model is so simple I could see it being applied in other means. Do you see Chemlambda being used to implement other different kinds of computers, for example, light based?

Sorry for so many questions. Thanks in advance.

— Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2.

VictorTaelin commented 8 years ago

Okay, this is too much information and I'm both amazed and scared. For a thought, I think the world would be a much better place if people like you got millions to run a company and make those kinds of ideas give broader fruits. This looks so much ahead from the current time. I'll be digging through your online content on the future, I hope I am able to catch up with all your thoughts eventually... feel free to suggest me some walkthroughs.

chorasimilarity commented 8 years ago

Thanks for the nice words. So the message is passing, now there's need for at least one owner of a million, for starters.

Perhaps the reply is too long, but it is written, therefore one can randomly access any part of it.

All this is a toy, therefore one should play with it first. One thing I would like to see solved is the following. The js output plays well on safari, almost as well on chromium and not well on firefox. This is probably due to the fact that there are many nodes and bonds which appear and disappear, which triggers the garbage collection, which then interacts badly with the rest of the script. Question: how to improve the js output so that it works well on any major browser?

Another useful thing would be a parser from lambda calculus to the mol notation, which takes a lambda term and outputs the corresponding molecule. This would be useful for: (a) checking the effect of various tweaks in the molecule of the lambda term, (b) is a step towards comparing strategies of optimal reductions, as they appear in the mol notation, with the reduction given by chemlambda.

Yet another thing would be to finish and improve the chemlambda-py, which looks abandoned for the moment. Maybe the structure of the algorithm is more visible in that format, hence the usefulness.

But the overall idea is that playing free with the stuff is the best approach, at least that's what I think.

VictorTaelin commented 8 years ago

@chorasimilarity, you probably know this stuff much better than me, so can I ask you a serious guidance? I've created http://github.com/maiavictor/caramel , a haskell-like syntax for the lambda calculus. It is mostly a fun project, but I'm really interested in using it to create simple things like games. The only issue is that, regardless of being considerably easier to program λc programs on this syntax... it is still unbearably slow to run them. Even compilers like GHC won't make my programs fast enough.

On the other hands, my other project, http://github.com/maiavictor/optlam - which I implemented after reading a little bit about interaction nets and lamping's abstract algorithm - has a reasonable performance for evaluating most λc programs. But it has a huge issue - it won't work in many terms because I didn't implement the so called "oracle". And what is worse, I haven't found a satisfactory implementation of the "oracle" on the literature. But I find it weird that your system is able to evaluate arbitrary λc programs even though it seems like it doesn't have croissants and brackets present. I wonder how?

My question is, what, in the world, would you suggest me to use to evaluate λc programs to the full normal form as fast as it is possible given today's technology? I'm not sure if that is one of the goals of this project... but whether it is chemlambda or something else, I think you're one of the few who would know the answer to that question.

chorasimilarity commented 8 years ago

The optlam project is very nice! I don't have croissants and brackets. Instead there are two fanouts, FO and FOE, FOE interacts with the fanin FI by anihilation, while FI and FO interact by a DIST (distributivity move) which gives a pair of FI and pair of FO. FO and FOE interact by a DIST too. Have you seen the shuffle trick? http://chorasimilarity.github.io/chemlambda-gui/dynamic/shuffle_trick.html

Another difference from sharing graphs a la Guerrini is that while A (the node for application) interacts with FO and FOE by a DIST move, giving a pair of FOE and a pair of A, the node L (lambda) interacts with FO or FOE by a DIST which gives a pair of L and a FOE and a FI.

On Wed, Sep 30, 2015 at 5:11 PM, MaiaVictor notifications@github.com wrote:

@chorasimilarity https://github.com/chorasimilarity, you probably know this stuff much better than me, so can I ask you a serious guidance? I've created http://github.com/maiavictor/caramel https://github.com/maiavictor/caramel , a haskell-like syntax for the lambda calculus. It is mostly a fun project, but I'm really interested in using it to create simple things like games. The only issue is that, regardless of being considerably easier to program λc programs on this syntax... it is still unbearably slow to run them. Even compilers like GHC won't make my programs fast enough.

On the other hands, my other project, http://github.com/maiavictor/optlam https://github.com/maiavictor/optlam - which I implemented after reading a little bit about interaction nets and lamping's abstract algorithm - has a reasonable performance for evaluating most λc programs. But it has a huge issue - it won't work in many terms because I didn't implement the so called "oracle". And what is worth, I haven't found a satisfactory implementation of the "oracle" on the literature. But I find it weird that your system is able to evaluate arbitrary λc programs even though it seems like it doesn't have croissants and brackets present. I wonder how?

My question is, what, in the world, would you suggest me to use to evaluate λc programs to the full normal form as fast as it is possible given today's technology?

— Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-144421796 .

VictorTaelin commented 8 years ago

@chorasimilarity do you have a drawing of the rewrite rules on chemlambda (like the ones I have on that sketch on optlam)? I think both systems are very similar, don't you? I would like to know what your rules are and how exactly you came up with them. I think one should remove all unnecessary rules and keep the smallest subset that works. In that sense, Optlam is simpler as it has only 2 rewrite rules, 3 nodes and is turing complete - but the need for node labels make it more complex and pollutes that elegance. (Although one doesn't need those: just 2 nodes is enough for turing completness.) In theory, one could build a molecular computer with Optlam, too, though - that would be interesting.

I am still thinking in the fastest way to implement those things in von-neumann machines, and the biggest issue I have is still: where do I keep nodes in space (i.e., on memory)? Physical locality (i.e., nodes that wire together be close together) is essential for many reasons. If you are programming for the CPU, after enough time your heap will become a pointer spaghetti, destroying your cache efficiency and dropping your performance to hell. If you are programming for the GPU, the blatant communication overload will destroy any kind of gain from massive parallelism. In both cases, if two nodes are far apart in memory, you can't reduce them efficiently. We need to break out of the graph mentality and be aware we don't live in an infinite-dimensional universe. We have 1D memory chip and we need to keep nodes "where they should be" there.

I've made an attempt at this as such:

  1. Each node "exists" in a cell in a physical, discrete 1D space (i.e., like a memory chip);
  2. Rewrite rules are applied directly to space (i.e., they rewrite consecutive slots in memory);
  3. Nodes can move through the memory with "movement rules";
  4. Active pairs can only interact if they are in adjacent places in memory.

This makes the system basically an asynchronous 1D automata, and is actually local - not only logically, but in practice. A "computation rule" (reduction+movements) needs to read exactly 4 adjacent spaces in memory. You can apply that rule in any order you want, randomly, in parallel, whatever, and the end-result will be the same. There is absolutely no communication between distant nodes. This makes the system absolutely perfect for reduction in any kind of parallel architecture at any scale. P2P networks, multi-core CPUs, GPUs, FGPAs, whatever. This is something new. Here is how it looks visually:

https://www.youtube.com/watch?v=S_O3FdX4wLg&feature=youtu.be

Note the wire connections are merely illustrative.

There is only one problem: I don't really have any criteria for the specific movement rules I chose. I just chose rules I "felt" made sense, but I know those aren't optimal. I have no proof the algorithm is correct, won't get stuck, etc., and I have an issue of space management: sometimes nodes will get congested, preventing duplication rules to occur. Basically, I need to find a set of movement rules that decreases the ratio between them and actual reduction rules.

Does that interest you? The same issues and principles might apply if you want to reduce chemlambda graphs efficiently on any kind of non-chemical real-world computer. Any idea how I could find the perfect set of rules for this goal? Do you have any additional insight to me?

chorasimilarity commented 8 years ago

Hey, great. I have to think about your movement rules/reduction rules. For the moment I'll answer or comment quickly to some parts, hoping that it is helpful.

  1. I use a different format than yours memory array. It goes like this. Suppose you have a graph made of 1-, 2- or 3-valent nodes of a certain kind (FRIN, FROUT, T are 1-valent), Arrow is 2-valent, A,L,FI,FO, FOE are 3-valent. Each node has ports, the no of ports equals the valence. Each port has a pair of types, from the set \left{ left, right, middle \right} \times \left{ in, out }. Each port has a port variable, picked from an arbitrary dictionary.

A mol file is an unordered list, where each element of the list corresponds to a node and an ordered list of values of its ports. Call an element of the mol file list a "record". Each record has on first position the type of the node (i.e. one of the symbols A ,L , FI, FO, FOE, Arrow, T, FRIN, FROUT) then, depending on the type of the node, the port values are listed in the following order, according to the type of the respective ports:

Any such list of records is a valid mol file, if the following conditions are satisfied:

We could ask from the beginning that a valid mol file is one where every port variable occurs twice, but instead we complete the mol file with new nodes if there are port variables which occur only once. For any port variable "a" which occurs only once in a port of type (, in) we add a node (i.e. record) (FRIN a). If "a" occurs only once in a port of type (,out) then we add a record (FROUT a).

Mathematically this is a description of a graph with directed edges and nodes of types A, L, ... , each node coming with a circular order. (They are ribbon graphs with the nodes "colored" with the type A, L, ...). Each edge is decorated with a port variable, and goes from the port of type (, in) to the port of type (, out). ("Half edges" are permitted, that is the reason for the nodes FRIN and FROUT).

  1. The rewrites function like this. Each rewrite replaces a LP (left pattern) by a RP (right pattern). The LP is always made by a pair of nodes (records). With the mol file conventions, the rewrites are given in http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html The rewrites have names which come from their LPs, for example the rewrite which corresponds to beta is named also AL. Some rewrites replace a pair of records by another pair, or by only one record. Others, in the "DIST" (i.e. distributivity, which is what you call commutativity), replace a pair of records by 4 records.
  2. The algorithm of application of rewrites is the following. First let's define a conflict of rewrites to be a pair of LPs which overlap, in the sense that there is a record which is in both LP's. The algorithm takes as input a valid mol file. The algorithm has as parameters the number of steps, the probabilities of rewrites.
  3. complete the mol file as explained, and get a mol file where each port variable occurs exactly twice
  4. use a counter to rename the port variables
  5. create a list of edges from the mol file, where each edge is a pair of records, one edge per port variable, first record from the list is the one where the port variable occurs in a port of type (, in), second is the record where the port variable occurs in a port of type (, out)
  6. create an empty pist of proposed_delete and proposed_add records
  7. start a cycle with a counter from 0 to the number of steps
  8. pick at random a maximal collection of edges such that each node appears in exactly one edge from the collection (this is for eliminating the conflicts)
  9. for each edge, verify if the pair of records is an LP for one of the rewrites (excepting the COMB rewrite). If it is then flip a coin with the given probability for that rewrite. If the coin gives 0 then add the records from the respective edge to the proposed_delete and the RP of the rewrite to the proposed_add (mind that the port variables of the records from the RP have to be new, so use the counter of port variables for that, for example)
  10. when the collection of edges which were chosen is exhausted, then delete from the mol file the records from proposed_delete and add the records from proposed_add
  11. some of the rewrites introduce Arrow nodes (records), these are eliminated by the COMB rewrites. There is a COMB rewrites cycle which eliminates these, as explained in the link. (Of course, alternatively one could treat the COMB rewrites as the other rewrites, but the actual algorithm I use proceeds like this)
  12. here ends the cycle with the number of steps.
  13. So this is an asynchronous graph rewrites automaton. The rewrites have conflicts therefore it is not sure if two runs of the algorithm, for the same input, lead eventually to the same output. The algorithm may not stop, in the sense that there exist mol files such that for any number of cycles N, the output still has Active LPs inside.
  14. If we want to reduce a (untyped) lambda term then proceed like this.
  15. Take the term and alpha rewrite it so that there is no variables conflict. Then create the syntactic tree of the term. This is a tree with one root and leaves decorated with the variables. Associate to this tree the following mol file.
  16. Give arbitrary names to the edges of the graph, including the leaves, i.e. think about the variables which are at the leaves as "leaves" whose stems are arrows of the graph.
  17. For any node A (application) create a mol file record (A a b c) where c is output edge of the Application node, a is the left input and b is the right input
  18. For any node L (lambda) create a record (L a b c) where c is the output of the lambda node, b is the edge coming from the variable bonded by that lambda and a is the edge coming from the body of the abstraction
  19. When finished then eliminate the lambda term variables. For any bound variable which does not occur elsewhere in the term add a record (T a) where "a" is the edge name of the edge connecting the respective lambda node with the respective variable. For any bound variable which does occur elsewhere in the term create a tree of FO nodes with the root at the place where the variable is bonded in a lambda node and with leaves connected to the leaves of the syntactic tree which are decorated with that variable. (That is why the edge corresponding to a variable bonded to a lambda has this orientation). For the free variables just erase the names and add FRIN nodes.

The sensible thing is the following. The rewrites are chosen so that for the terms which have a normal form in untyped lambda beta (so no eta rewrite!), and which have all bonded variables occurring somewhere else in the term, the algorithm gives eventually an unique result, for a sufficiently big number of steps. (For some terms which have bonded variables which do not occur elsewhere this is also true, but not for any such term.) This is shown by remarking that the conflicts which may exist, at the level of graph reduction, eventually disappear for the graphs which are constructed from such terms.

  1. The graph reduction is not the same as any of the lambda terms reduction strategies I know about, including those coming from interaction graphs. This is something so obvious, if you think about it. Even for your algorithm, say, I remarked that decode is not the inverse of encode. This means. If you think about what what is explained at 5. as encode, then what is the decode? First remark that a graph which is constructed from a lambda term does not contain FI (i.e. fanin) and FOE (that's a supplementary fanout) nodes. Start from the algebra freely generated by the port variables of a graph and add the following relations:
  2. for any record (L a b c) add the relation c = Lb.a
  3. for any record (A a b c) add the relation c = ab
  4. for any record (FO a b c) or (FOE a b c) add two relations: a=b and a=c Suppose that there is no FI node in the graph and that the graph has only one FROUT node. Then consider the set of port variables which either come from a port of a FRIN node (these are free variables) or they occur in a port (left,out) of an L node. If you can rewrite the port variables by using the relations described, such that the port variable of the FROUT node gets an unique name (element of the algebra where occur only variables from the mentioned set) then decode(graph) is this element, which appears as a lambda term.

These are the decorations I'm talking about. There are equivalent decorations in your format. You remarked that there exist different graphs (up to directed graphs isomorphism) which decode to the same lambda term. Moreover, there exist graphs which are not in the image of the encode function, but which still are in the domain of the decode function.

Where do these graphs come from? Basically, they come from the application of DIST (or commutativity) rewrites.

In your algorithm you work at the level of terms, even of you use graph rewrites, because, as I tried to suggest, by using your rewrites, if you take the graph for the combinator S and you add a fanout node, then you get a graph which is not one of a lambda term, but still it is a graph which you can reduce. If you do it, then, I can guarantee, because your rewrites are a subset of mine, that you don't get two copies of the graph of S, you get a graph which has two roots, but it is connected. However, you can apply the decode (in the sense I explained) and you still get the roots (ports) decorated with S.

With my rewrites I get two copies of S.

The bad part of my algorithm is that I don't have a garbage management. Well, there is none in the chemical world. Technically, I can't have one which is purely local, other than what I get from using the rewrites where the node T is involved.

So, this is a very long reply, but helpful I hope. I look forward to talk more, I'm very intrigued by what you did with the movement rules.

Oh, btw, exactly the same ideas work for Turing machines, in the sense that exactly the same algorithm, but for a different family of nodes (there are still the FI, FOE, FO, FRIN, FROUT, Arrow, but now instead of A, L there are 2-valent nodes, one for each internal state and one for each tape symbol, plus a node which express the movement of the head, hear hear!) works. Now, the interesting part is that the algorithm is not sequential in any way, also one may have several heads on the same tape, one may multiplicate tapes and we can combine these with the graphs from chemlambda and it still works. The rewrites are explained at http://chorasimilarity.github.io/chemlambda-gui/dynamic/turingchem.html and there are several examples at the demo page, like http://chorasimilarity.github.io/chemlambda-gui/dynamic/church3bb.html where a busy beaver is "abstracted" and the Church number 3 is applied to it. The script to be used is quiner_shuffle.sh and the mol file used is https://github.com/chorasimilarity/chemlambda-gui/blob/gh-pages/dynamic/church3bb.mol .

On Sun, Feb 21, 2016 at 6:10 PM, MaiaVictor notifications@github.com wrote:

@chorasimilarity https://github.com/chorasimilarity do you have a drawing of the rewrite rules on chemlambda (like the ones I have on that sketch on optlam)? I think both systems are very similar, don't you? I would like to know what your rules are and how exactly you came up with them. I think one should remove all unnecessary rules and keep the smallest subset that works. In that sense, Optlam is simpler as it has only 2 rewrite rules, 3 nodes and is turing complete - but the need for node ids somewhat pollutes that elegance. In theory, one could build a molecular computer with Optlam, too, though - that would be interesting.

I am still thinking in the fastest way to implement those things in von-neumann machines, and the biggest issue I have is still: where do I keep nodes in space (i.e., on memory)? What looks like "local" reductions on the graphs aren't actually local reductions on the machine, because nodes are physically stored in random places on the memory. To store graphs with arbitrary locality we need infinite spatial dimensions, but we only have 3 to work with, and memory chips are actually 1D. Physical locality is essential for many reasons. If you are programming for the CPU, you need nodes that wire together to be close together in memory, or else your cache performance drops to hell. If you are programming for the GPU, you don't even have an option, since you have to partition your memory in separate computing units. So, in both cases, if two nodes are far apart in memory, you can't reduce them - it is waste. It is obvious that a strategy to spatially keep nodes "where t hey should be" is the most important thing to implement those systems fast in real world computers.

So, when computing a graph reducer, one needs to pay attention to the physical location of nodes somehow. I've made an attempt at this: I placed nodes in a 1D array and added rules for node movements so a node isn't allocated in a fixed place, but moves through the memory like a atom moves through space. Interaction only takes place when two active pairs met at neighbor positions in memory - i.e., when they collide. This makes the system a kind of non-synchronized linear automata, and is finally local- not only logically, but in practice. Here is how it looks:

https://www.youtube.com/watch?v=S_O3FdX4wLg&feature=youtu.be

Notice the connections are merely illustrative: in order to make progress, the CPU needs to looks at, at most, 4 consecutive positions in memory. You can apply the "progress" function in any arbitrary position of the array, in any order you want, in parallel, randomly, whatever, and the end-result will be the same. There is absolutely no communication between distant nodes at all. This is something new in the area.

There is only one problem with this system: I don't really have any basis for the specific movement rules I chose. I just chose rules I "felt" made sense, but I know those rules aren't optimal and I have no proof the algorithm won't get stuck. For example, this system has an issue of space management, because sometimes nodes will get congestioned and it duplication will not be able to occur for some time. Basically, I need to find a set of movement rules that optimizes the ratio between movement rules and actual reduction rules.

Does that interest you? Any idea how I could find the perfect set of rules for this goal? Do you have any additional insight to me?

— Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-186851240 .

chorasimilarity commented 8 years ago

Oh, I'm starting to get it. Well these are ribbon graphs, so basically a ribbon graph is a pair of permutations. A permutation is simply a collection of cycles. Maybe there is a way to use this structure for an efficient 1D automaton. I explain the structure here http://imar.ro/~mbuliga/diffuse.pdf but is so heavy, looks like shit. However if you find a way to use this permutation structure, then that would be a feat!

On Sun, Feb 21, 2016 at 10:13 PM, Marius Buliga marius.buliga@gmail.com wrote:

Hey, great. I have to think about your movement rules/reduction rules. For the moment I'll answer or comment quickly to some parts, hoping that it is helpful.

  1. I use a different format than yours memory array. It goes like this. Suppose you have a graph made of 1-, 2- or 3-valent nodes of a certain kind (FRIN, FROUT, T are 1-valent), Arrow is 2-valent, A,L,FI,FO, FOE are 3-valent. Each node has ports, the no of ports equals the valence. Each port has a pair of types, from the set \left{ left, right, middle \right} \times \left{ in, out }. Each port has a port variable, picked from an arbitrary dictionary.

A mol file is an unordered list, where each element of the list corresponds to a node and an ordered list of values of its ports. Call an element of the mol file list a "record". Each record has on first position the type of the node (i.e. one of the symbols A ,L , FI, FO, FOE, Arrow, T, FRIN, FROUT) then, depending on the type of the node, the port values are listed in the following order, according to the type of the respective ports:

  • (L a b c) for a node L with the port (middle,in) with value a, port (left,out) with value b, port (right,out)
  • (FO a b c) for a node FO with the port (middle,in) with value a, port (left,out) with value b, port (right,out) with value c
  • (FOE a b c) for a node FOE with the port (middle,in) with value a, port (left,out) with value b, port (right,out) with value c
  • (A a b c) for a node A with the port (left,in) with value a, port (right,in) with value b, port (middle,out) with value c
  • (FI a b c) for a node FI with the port (left,in) with value a, port (right,in) with value b, port (middle,out) with value c
  • (Arrow a b) for a node Arrow with the port (middle,in) with value a, port (middle,out) with value b
  • (T a) for a node T with the port (middle,in) with value a
  • (FROUT a) for a node FROUT with the port (middle,in) with value a
  • (FRIN a) for a node FRIN with the port (middle,out) with value a

Any such list of records is a valid mol file, if the following conditions are satisfied:

  • each port variable occurs at most twice
  • if a port variable occurs twice then it does once in a port of type (,in) and another in a port of type (.out)

We could ask from the beginning that a valid mol file is one where every port variable occurs twice, but instead we complete the mol file with new nodes if there are port variables which occur only once. For any port variable "a" which occurs only once in a port of type (, in) we add a node (i.e. record) (FRIN a). If "a" occurs only once in a port of type (,out) then we add a record (FROUT a).

Mathematically this is a description of a graph with directed edges and nodes of types A, L, ... , each node coming with a circular order. (They are ribbon graphs with the nodes "colored" with the type A, L, ...). Each edge is decorated with a port variable, and goes from the port of type (, in) to the port of type (, out). ("Half edges" are permitted, that is the reason for the nodes FRIN and FROUT).

  1. The rewrites function like this. Each rewrite replaces a LP (left pattern) by a RP (right pattern). The LP is always made by a pair of nodes (records). With the mol file conventions, the rewrites are given in http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html The rewrites have names which come from their LPs, for example the rewrite which corresponds to beta is named also AL. Some rewrites replace a pair of records by another pair, or by only one record. Others, in the "DIST" (i.e. distributivity, which is what you call commutativity), replace a pair of records by 4 records.
  2. The algorithm of application of rewrites is the following. First let's define a conflict of rewrites to be a pair of LPs which overlap, in the sense that there is a record which is in both LP's. The algorithm takes as input a valid mol file. The algorithm has as parameters the number of steps, the probabilities of rewrites.
  3. complete the mol file as explained, and get a mol file where each port variable occurs exactly twice
  4. use a counter to rename the port variables
  5. create a list of edges from the mol file, where each edge is a pair of records, one edge per port variable, first record from the list is the one where the port variable occurs in a port of type (, in), second is the record where the port variable occurs in a port of type (, out)
  6. create an empty pist of proposed_delete and proposed_add records
  7. start a cycle with a counter from 0 to the number of steps
  8. pick at random a maximal collection of edges such that each node appears in exactly one edge from the collection (this is for eliminating the conflicts)
  9. for each edge, verify if the pair of records is an LP for one of the rewrites (excepting the COMB rewrite). If it is then flip a coin with the given probability for that rewrite. If the coin gives 0 then add the records from the respective edge to the proposed_delete and the RP of the rewrite to the proposed_add (mind that the port variables of the records from the RP have to be new, so use the counter of port variables for that, for example)
  10. when the collection of edges which were chosen is exhausted, then delete from the mol file the records from proposed_delete and add the records from proposed_add
  11. some of the rewrites introduce Arrow nodes (records), these are eliminated by the COMB rewrites. There is a COMB rewrites cycle which eliminates these, as explained in the link. (Of course, alternatively one could treat the COMB rewrites as the other rewrites, but the actual algorithm I use proceeds like this)
  12. here ends the cycle with the number of steps.
  13. So this is an asynchronous graph rewrites automaton. The rewrites have conflicts therefore it is not sure if two runs of the algorithm, for the same input, lead eventually to the same output. The algorithm may not stop, in the sense that there exist mol files such that for any number of cycles N, the output still has Active LPs inside.
  14. If we want to reduce a (untyped) lambda term then proceed like this.
  15. Take the term and alpha rewrite it so that there is no variables conflict. Then create the syntactic tree of the term. This is a tree with one root and leaves decorated with the variables. Associate to this tree the following mol file.
  16. Give arbitrary names to the edges of the graph, including the leaves, i.e. think about the variables which are at the leaves as "leaves" whose stems are arrows of the graph.
  17. For any node A (application) create a mol file record (A a b c) where c is output edge of the Application node, a is the left input and b is the right input
  18. For any node L (lambda) create a record (L a b c) where c is the output of the lambda node, b is the edge coming from the variable bonded by that lambda and a is the edge coming from the body of the abstraction
  19. When finished then eliminate the lambda term variables. For any bound variable which does not occur elsewhere in the term add a record (T a) where "a" is the edge name of the edge connecting the respective lambda node with the respective variable. For any bound variable which does occur elsewhere in the term create a tree of FO nodes with the root at the place where the variable is bonded in a lambda node and with leaves connected to the leaves of the syntactic tree which are decorated with that variable. (That is why the edge corresponding to a variable bonded to a lambda has this orientation). For the free variables just erase the names and add FRIN nodes.

The sensible thing is the following. The rewrites are chosen so that for the terms which have a normal form in untyped lambda beta (so no eta rewrite!), and which have all bonded variables occurring somewhere else in the term, the algorithm gives eventually an unique result, for a sufficiently big number of steps. (For some terms which have bonded variables which do not occur elsewhere this is also true, but not for any such term.) This is shown by remarking that the conflicts which may exist, at the level of graph reduction, eventually disappear for the graphs which are constructed from such terms.

  1. The graph reduction is not the same as any of the lambda terms reduction strategies I know about, including those coming from interaction graphs. This is something so obvious, if you think about it. Even for your algorithm, say, I remarked that decode is not the inverse of encode. This means. If you think about what what is explained at 5. as encode, then what is the decode? First remark that a graph which is constructed from a lambda term does not contain FI (i.e. fanin) and FOE (that's a supplementary fanout) nodes. Start from the algebra freely generated by the port variables of a graph and add the following relations:
  2. for any record (L a b c) add the relation c = Lb.a
  3. for any record (A a b c) add the relation c = ab
  4. for any record (FO a b c) or (FOE a b c) add two relations: a=b and a=c Suppose that there is no FI node in the graph and that the graph has only one FROUT node. Then consider the set of port variables which either come from a port of a FRIN node (these are free variables) or they occur in a port (left,out) of an L node. If you can rewrite the port variables by using the relations described, such that the port variable of the FROUT node gets an unique name (element of the algebra where occur only variables from the mentioned set) then decode(graph) is this element, which appears as a lambda term.

These are the decorations I'm talking about. There are equivalent decorations in your format. You remarked that there exist different graphs (up to directed graphs isomorphism) which decode to the same lambda term. Moreover, there exist graphs which are not in the image of the encode function, but which still are in the domain of the decode function.

Where do these graphs come from? Basically, they come from the application of DIST (or commutativity) rewrites.

In your algorithm you work at the level of terms, even of you use graph rewrites, because, as I tried to suggest, by using your rewrites, if you take the graph for the combinator S and you add a fanout node, then you get a graph which is not one of a lambda term, but still it is a graph which you can reduce. If you do it, then, I can guarantee, because your rewrites are a subset of mine, that you don't get two copies of the graph of S, you get a graph which has two roots, but it is connected. However, you can apply the decode (in the sense I explained) and you still get the roots (ports) decorated with S.

With my rewrites I get two copies of S.

The bad part of my algorithm is that I don't have a garbage management. Well, there is none in the chemical world. Technically, I can't have one which is purely local, other than what I get from using the rewrites where the node T is involved.

So, this is a very long reply, but helpful I hope. I look forward to talk more, I'm very intrigued by what you did with the movement rules.

Oh, btw, exactly the same ideas work for Turing machines, in the sense that exactly the same algorithm, but for a different family of nodes (there are still the FI, FOE, FO, FRIN, FROUT, Arrow, but now instead of A, L there are 2-valent nodes, one for each internal state and one for each tape symbol, plus a node which express the movement of the head, hear hear!) works. Now, the interesting part is that the algorithm is not sequential in any way, also one may have several heads on the same tape, one may multiplicate tapes and we can combine these with the graphs from chemlambda and it still works. The rewrites are explained at http://chorasimilarity.github.io/chemlambda-gui/dynamic/turingchem.html and there are several examples at the demo page, like http://chorasimilarity.github.io/chemlambda-gui/dynamic/church3bb.html where a busy beaver is "abstracted" and the Church number 3 is applied to it. The script to be used is quiner_shuffle.sh and the mol file used is https://github.com/chorasimilarity/chemlambda-gui/blob/gh-pages/dynamic/church3bb.mol .

On Sun, Feb 21, 2016 at 6:10 PM, MaiaVictor notifications@github.com wrote:

@chorasimilarity https://github.com/chorasimilarity do you have a drawing of the rewrite rules on chemlambda (like the ones I have on that sketch on optlam)? I think both systems are very similar, don't you? I would like to know what your rules are and how exactly you came up with them. I think one should remove all unnecessary rules and keep the smallest subset that works. In that sense, Optlam is simpler as it has only 2 rewrite rules, 3 nodes and is turing complete - but the need for node ids somewhat pollutes that elegance. In theory, one could build a molecular computer with Optlam, too, though - that would be interesting.

I am still thinking in the fastest way to implement those things in von-neumann machines, and the biggest issue I have is still: where do I keep nodes in space (i.e., on memory)? What looks like "local" reductions on the graphs aren't actually local reductions on the machine, because nodes are physically stored in random places on the memory. To store graphs with arbitrary locality we need infinite spatial dimensions, but we only have 3 to work with, and memory chips are actually 1D. Physical locality is essential for many reasons. If you are programming for the CPU, you need nodes that wire together to be close together in memory, or else your cache performance drops to hell. If you are programming for the GPU, you don't even have an option, since you have to partition your memory in separate computing units. So, in both cases, if two nodes are far apart in memory, you can't reduce them - it is waste. It is obvious that a strategy to spatially keep nodes "where t hey should be" is the most important thing to implement those systems fast in real world computers.

So, when computing a graph reducer, one needs to pay attention to the physical location of nodes somehow. I've made an attempt at this: I placed nodes in a 1D array and added rules for node movements so a node isn't allocated in a fixed place, but moves through the memory like a atom moves through space. Interaction only takes place when two active pairs met at neighbor positions in memory - i.e., when they collide. This makes the system a kind of non-synchronized linear automata, and is finally local- not only logically, but in practice. Here is how it looks:

https://www.youtube.com/watch?v=S_O3FdX4wLg&feature=youtu.be

Notice the connections are merely illustrative: in order to make progress, the CPU needs to looks at, at most, 4 consecutive positions in memory. You can apply the "progress" function in any arbitrary position of the array, in any order you want, in parallel, randomly, whatever, and the end-result will be the same. There is absolutely no communication between distant nodes at all. This is something new in the area.

There is only one problem with this system: I don't really have any basis for the specific movement rules I chose. I just chose rules I "felt" made sense, but I know those rules aren't optimal and I have no proof the algorithm won't get stuck. For example, this system has an issue of space management, because sometimes nodes will get congestioned and it duplication will not be able to occur for some time. Basically, I need to find a set of movement rules that optimizes the ratio between movement rules and actual reduction rules.

Does that interest you? Any idea how I could find the perfect set of rules for this goal? Do you have any additional insight to me?

— Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-186851240 .

chorasimilarity commented 8 years ago

Are you doing it here? https://github.com/MaiaVictor/LPU I'm very interested.

On Sun, Feb 21, 2016 at 10:26 PM, Marius Buliga marius.buliga@gmail.com wrote:

Oh, I'm starting to get it. Well these are ribbon graphs, so basically a ribbon graph is a pair of permutations. A permutation is simply a collection of cycles. Maybe there is a way to use this structure for an efficient 1D automaton. I explain the structure here http://imar.ro/~mbuliga/diffuse.pdf but is so heavy, looks like shit. However if you find a way to use this permutation structure, then that would be a feat!

On Sun, Feb 21, 2016 at 10:13 PM, Marius Buliga marius.buliga@gmail.com wrote:

Hey, great. I have to think about your movement rules/reduction rules. For the moment I'll answer or comment quickly to some parts, hoping that it is helpful.

  1. I use a different format than yours memory array. It goes like this. Suppose you have a graph made of 1-, 2- or 3-valent nodes of a certain kind (FRIN, FROUT, T are 1-valent), Arrow is 2-valent, A,L,FI,FO, FOE are 3-valent. Each node has ports, the no of ports equals the valence. Each port has a pair of types, from the set \left{ left, right, middle \right} \times \left{ in, out }. Each port has a port variable, picked from an arbitrary dictionary.

A mol file is an unordered list, where each element of the list corresponds to a node and an ordered list of values of its ports. Call an element of the mol file list a "record". Each record has on first position the type of the node (i.e. one of the symbols A ,L , FI, FO, FOE, Arrow, T, FRIN, FROUT) then, depending on the type of the node, the port values are listed in the following order, according to the type of the respective ports:

  • (L a b c) for a node L with the port (middle,in) with value a, port (left,out) with value b, port (right,out)
  • (FO a b c) for a node FO with the port (middle,in) with value a, port (left,out) with value b, port (right,out) with value c
  • (FOE a b c) for a node FOE with the port (middle,in) with value a, port (left,out) with value b, port (right,out) with value c
  • (A a b c) for a node A with the port (left,in) with value a, port (right,in) with value b, port (middle,out) with value c
  • (FI a b c) for a node FI with the port (left,in) with value a, port (right,in) with value b, port (middle,out) with value c
  • (Arrow a b) for a node Arrow with the port (middle,in) with value a, port (middle,out) with value b
  • (T a) for a node T with the port (middle,in) with value a
  • (FROUT a) for a node FROUT with the port (middle,in) with value a
  • (FRIN a) for a node FRIN with the port (middle,out) with value a

Any such list of records is a valid mol file, if the following conditions are satisfied:

  • each port variable occurs at most twice
  • if a port variable occurs twice then it does once in a port of type (,in) and another in a port of type (.out)

We could ask from the beginning that a valid mol file is one where every port variable occurs twice, but instead we complete the mol file with new nodes if there are port variables which occur only once. For any port variable "a" which occurs only once in a port of type (, in) we add a node (i.e. record) (FRIN a). If "a" occurs only once in a port of type (,out) then we add a record (FROUT a).

Mathematically this is a description of a graph with directed edges and nodes of types A, L, ... , each node coming with a circular order. (They are ribbon graphs with the nodes "colored" with the type A, L, ...). Each edge is decorated with a port variable, and goes from the port of type (, in) to the port of type (, out). ("Half edges" are permitted, that is the reason for the nodes FRIN and FROUT).

  1. The rewrites function like this. Each rewrite replaces a LP (left pattern) by a RP (right pattern). The LP is always made by a pair of nodes (records). With the mol file conventions, the rewrites are given in http://chorasimilarity.github.io/chemlambda-gui/dynamic/moves.html The rewrites have names which come from their LPs, for example the rewrite which corresponds to beta is named also AL. Some rewrites replace a pair of records by another pair, or by only one record. Others, in the "DIST" (i.e. distributivity, which is what you call commutativity), replace a pair of records by 4 records.
  2. The algorithm of application of rewrites is the following. First let's define a conflict of rewrites to be a pair of LPs which overlap, in the sense that there is a record which is in both LP's. The algorithm takes as input a valid mol file. The algorithm has as parameters the number of steps, the probabilities of rewrites.
  3. complete the mol file as explained, and get a mol file where each port variable occurs exactly twice
  4. use a counter to rename the port variables
  5. create a list of edges from the mol file, where each edge is a pair of records, one edge per port variable, first record from the list is the one where the port variable occurs in a port of type (, in), second is the record where the port variable occurs in a port of type (, out)
  6. create an empty pist of proposed_delete and proposed_add records
  7. start a cycle with a counter from 0 to the number of steps
  8. pick at random a maximal collection of edges such that each node appears in exactly one edge from the collection (this is for eliminating the conflicts)
  9. for each edge, verify if the pair of records is an LP for one of the rewrites (excepting the COMB rewrite). If it is then flip a coin with the given probability for that rewrite. If the coin gives 0 then add the records from the respective edge to the proposed_delete and the RP of the rewrite to the proposed_add (mind that the port variables of the records from the RP have to be new, so use the counter of port variables for that, for example)
  10. when the collection of edges which were chosen is exhausted, then delete from the mol file the records from proposed_delete and add the records from proposed_add
  11. some of the rewrites introduce Arrow nodes (records), these are eliminated by the COMB rewrites. There is a COMB rewrites cycle which eliminates these, as explained in the link. (Of course, alternatively one could treat the COMB rewrites as the other rewrites, but the actual algorithm I use proceeds like this)
  12. here ends the cycle with the number of steps.
  13. So this is an asynchronous graph rewrites automaton. The rewrites have conflicts therefore it is not sure if two runs of the algorithm, for the same input, lead eventually to the same output. The algorithm may not stop, in the sense that there exist mol files such that for any number of cycles N, the output still has Active LPs inside.
  14. If we want to reduce a (untyped) lambda term then proceed like this.
  15. Take the term and alpha rewrite it so that there is no variables conflict. Then create the syntactic tree of the term. This is a tree with one root and leaves decorated with the variables. Associate to this tree the following mol file.
  16. Give arbitrary names to the edges of the graph, including the leaves, i.e. think about the variables which are at the leaves as "leaves" whose stems are arrows of the graph.
  17. For any node A (application) create a mol file record (A a b c) where c is output edge of the Application node, a is the left input and b is the right input
  18. For any node L (lambda) create a record (L a b c) where c is the output of the lambda node, b is the edge coming from the variable bonded by that lambda and a is the edge coming from the body of the abstraction
  19. When finished then eliminate the lambda term variables. For any bound variable which does not occur elsewhere in the term add a record (T a) where "a" is the edge name of the edge connecting the respective lambda node with the respective variable. For any bound variable which does occur elsewhere in the term create a tree of FO nodes with the root at the place where the variable is bonded in a lambda node and with leaves connected to the leaves of the syntactic tree which are decorated with that variable. (That is why the edge corresponding to a variable bonded to a lambda has this orientation). For the free variables just erase the names and add FRIN nodes.

The sensible thing is the following. The rewrites are chosen so that for the terms which have a normal form in untyped lambda beta (so no eta rewrite!), and which have all bonded variables occurring somewhere else in the term, the algorithm gives eventually an unique result, for a sufficiently big number of steps. (For some terms which have bonded variables which do not occur elsewhere this is also true, but not for any such term.) This is shown by remarking that the conflicts which may exist, at the level of graph reduction, eventually disappear for the graphs which are constructed from such terms.

  1. The graph reduction is not the same as any of the lambda terms reduction strategies I know about, including those coming from interaction graphs. This is something so obvious, if you think about it. Even for your algorithm, say, I remarked that decode is not the inverse of encode. This means. If you think about what what is explained at 5. as encode, then what is the decode? First remark that a graph which is constructed from a lambda term does not contain FI (i.e. fanin) and FOE (that's a supplementary fanout) nodes. Start from the algebra freely generated by the port variables of a graph and add the following relations:
  2. for any record (L a b c) add the relation c = Lb.a
  3. for any record (A a b c) add the relation c = ab
  4. for any record (FO a b c) or (FOE a b c) add two relations: a=b and a=c Suppose that there is no FI node in the graph and that the graph has only one FROUT node. Then consider the set of port variables which either come from a port of a FRIN node (these are free variables) or they occur in a port (left,out) of an L node. If you can rewrite the port variables by using the relations described, such that the port variable of the FROUT node gets an unique name (element of the algebra where occur only variables from the mentioned set) then decode(graph) is this element, which appears as a lambda term.

These are the decorations I'm talking about. There are equivalent decorations in your format. You remarked that there exist different graphs (up to directed graphs isomorphism) which decode to the same lambda term. Moreover, there exist graphs which are not in the image of the encode function, but which still are in the domain of the decode function.

Where do these graphs come from? Basically, they come from the application of DIST (or commutativity) rewrites.

In your algorithm you work at the level of terms, even of you use graph rewrites, because, as I tried to suggest, by using your rewrites, if you take the graph for the combinator S and you add a fanout node, then you get a graph which is not one of a lambda term, but still it is a graph which you can reduce. If you do it, then, I can guarantee, because your rewrites are a subset of mine, that you don't get two copies of the graph of S, you get a graph which has two roots, but it is connected. However, you can apply the decode (in the sense I explained) and you still get the roots (ports) decorated with S.

With my rewrites I get two copies of S.

The bad part of my algorithm is that I don't have a garbage management. Well, there is none in the chemical world. Technically, I can't have one which is purely local, other than what I get from using the rewrites where the node T is involved.

So, this is a very long reply, but helpful I hope. I look forward to talk more, I'm very intrigued by what you did with the movement rules.

Oh, btw, exactly the same ideas work for Turing machines, in the sense that exactly the same algorithm, but for a different family of nodes (there are still the FI, FOE, FO, FRIN, FROUT, Arrow, but now instead of A, L there are 2-valent nodes, one for each internal state and one for each tape symbol, plus a node which express the movement of the head, hear hear!) works. Now, the interesting part is that the algorithm is not sequential in any way, also one may have several heads on the same tape, one may multiplicate tapes and we can combine these with the graphs from chemlambda and it still works. The rewrites are explained at http://chorasimilarity.github.io/chemlambda-gui/dynamic/turingchem.html and there are several examples at the demo page, like http://chorasimilarity.github.io/chemlambda-gui/dynamic/church3bb.html where a busy beaver is "abstracted" and the Church number 3 is applied to it. The script to be used is quiner_shuffle.sh and the mol file used is https://github.com/chorasimilarity/chemlambda-gui/blob/gh-pages/dynamic/church3bb.mol .

On Sun, Feb 21, 2016 at 6:10 PM, MaiaVictor notifications@github.com wrote:

@chorasimilarity https://github.com/chorasimilarity do you have a drawing of the rewrite rules on chemlambda (like the ones I have on that sketch on optlam)? I think both systems are very similar, don't you? I would like to know what your rules are and how exactly you came up with them. I think one should remove all unnecessary rules and keep the smallest subset that works. In that sense, Optlam is simpler as it has only 2 rewrite rules, 3 nodes and is turing complete - but the need for node ids somewhat pollutes that elegance. In theory, one could build a molecular computer with Optlam, too, though - that would be interesting.

I am still thinking in the fastest way to implement those things in von-neumann machines, and the biggest issue I have is still: where do I keep nodes in space (i.e., on memory)? What looks like "local" reductions on the graphs aren't actually local reductions on the machine, because nodes are physically stored in random places on the memory. To store graphs with arbitrary locality we need infinite spatial dimensions, but we only have 3 to work with, and memory chips are actually 1D. Physical locality is essential for many reasons. If you are programming for the CPU, you need nodes that wire together to be close together in memory, or else your cache performance drops to hell. If you are programming for the GPU, you don't even have an option, since you have to partition your memory in separate computing units. So, in both cases, if two nodes are far apart in memory, you can't reduce them - it is waste. It is obvious that a strategy to spatially keep nodes "where t hey should be" is the most important thing to implement those systems fast in real world computers.

So, when computing a graph reducer, one needs to pay attention to the physical location of nodes somehow. I've made an attempt at this: I placed nodes in a 1D array and added rules for node movements so a node isn't allocated in a fixed place, but moves through the memory like a atom moves through space. Interaction only takes place when two active pairs met at neighbor positions in memory - i.e., when they collide. This makes the system a kind of non-synchronized linear automata, and is finally local- not only logically, but in practice. Here is how it looks:

https://www.youtube.com/watch?v=S_O3FdX4wLg&feature=youtu.be

Notice the connections are merely illustrative: in order to make progress, the CPU needs to looks at, at most, 4 consecutive positions in memory. You can apply the "progress" function in any arbitrary position of the array, in any order you want, in parallel, randomly, whatever, and the end-result will be the same. There is absolutely no communication between distant nodes at all. This is something new in the area.

There is only one problem with this system: I don't really have any basis for the specific movement rules I chose. I just chose rules I "felt" made sense, but I know those rules aren't optimal and I have no proof the algorithm won't get stuck. For example, this system has an issue of space management, because sometimes nodes will get congestioned and it duplication will not be able to occur for some time. Basically, I need to find a set of movement rules that optimizes the ratio between movement rules and actual reduction rules.

Does that interest you? Any idea how I could find the perfect set of rules for this goal? Do you have any additional insight to me?

— Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-186851240 .

VictorTaelin commented 8 years ago

I didn't see your last message, sorry. So, grab a popcorn, I guess what I'll say will make you glad.

Remember I suggested the biggest issue with practical implementations of interaction combinators was that nodes that wire together become distant in memory? That is, as you keep allocating/freeing nodes, the heap becomes a pointer spaghetti. This takes the cache efficiency away and makes the algorithm very difficult to parallelize, because GPUs are made for vectors, not dynamically expanding graphs. So, while, in theory, we count the complexity of a λ-calculus term as the number of rewrite rules necessary to get to its normal form, in practice, the time it takes for the program to finish isn't within a constant of the number of necessary rewrite rules.

LPU solves this problem. The idea is that the memory should be split in millions of very small (currently, 128 bit) blocks, each one with a very simple processing core, and capable of communicating with its neighbors. A computer would consist of millions of those small processing units wired together, operating in parallel. Each unit is responsible for 2 things: one, space management, erasing, duplicating and moving information around (to its neighbors); two, applying rewrite rules that implement the logic of interaction combinators. The main hope is that, by following those rules, nodes that wire together will stay in units that are also close, and, thus, they will be able to interact much faster.

You can see this illustrated on this file. It shows the computation of the interaction net equivalent to the term ((λ f x . f (f (f x))) (λ f x . f (f (f x))) (λ x . x)). Each column of 4 numbers represents an unit (processing core + 128bit memory), and each row is the whole memory in an instant in time. As you go down, you can see how the nodes travel through the memory and interact with each other. No global communication, no sequence of steps, only local rules applied in parallel.

Now, one has to wonder if the added overhead - the movement rules - introduce an asymptotic slowdown of some kind. This statistic (left = optlam, sequestial, right = LPU, parallel, but emulated sequentially), which has a lot to reveal, suggests that no: the number of "global clocks" (i.e., every processing unit executing its loop once) appears to be to be within a constant factor of the number of rewrite rules.

That is both good and bad. It is bad, because one would expect that, in a massively parallel architecture, the number of global clocks necessary to reduce a net would be smaller than the number of total rewrite rules required to get to the normal form. Ideally, the more cores you add, the higher the "rewrite / clock" ratio should be. That doesn't happen with LPU. This could be because nodes don't move optimally and get stuck far apart, and some different logic would solve this problem. But it is not impossible that there is a hard limit imposed by the universe itself that dictates the amount of time required to get a net to the normal form is proportional to the amount of rewrite rules required to reduce it. I have some reasons to believe this is true. If it is, 1. the number of rewrite rules is the perfect and final measure of the complexity of a term; 2. no amount of "parallel" computing can make it faster; 3. LPU is very close to perfect computer architecture. But I prefer to think that I just haven't found the right rules yet that will leverage the parallel capacities of interaction nets.

Nether less, this is also a very promising result, because it shows that LPU is at least as good as the naive approach that delegates the whole allocating to the CPU, except it is much simpler, subsumes the function of memory allocation and garbage collectors, allow for high-level languages to shine, and actually scales. We're so used to hear that CPU has O(1) indexing, but that isn't true. As the memory gets bigger, it gets further and further from the cores, and it takes some time to get the info there. That is why you can't have a 1 petabyte memory ram. It would be as slow as your HD. With LPU, that doesn't happen. You can keep stacking units horizontally from megabytes, to gigabytes, to tera, peta, exabytes, and the performance won't degrade. So, under the LPU architecture, the amount of time that it takes to reduce any interaction net (and, thus, compute any algorithm) becomes merely a function of two variables: 1. the amount of rewrite rules necessary to reduce this net (always a fixed value) and 2. how fast you manage to make the clock of a single LPU. It is, for once, a computer architecture on which the programmer is completely free from any memory layout and other real-world considerations when evaluating the performance of his programs, and all that counts is the complexity of the logic itself.

I hope this was informative. Let me know if you have any insights.

chorasimilarity commented 8 years ago

Congratulations! That's excellent! I do have some questions or comments, thank you for taking the time to respond:

  1. "the memory should be split in millions of very small (currently, 128 bits) blocks, each one with a very simple processing core, and capable of communicating with its neighbors". a) is this a hardware thing which you simulate in javascript? b) is the neighbours relation graph fixed during the computation? c) does it matter if some blocks belong physically to computer A and others belong to computer B?
  2. "Each unit is responsible for 2 things: one, space management, erasing, duplicating and moving information around (to its neighbors)". What is the algorithm for moving information around?
  3. How can I modify your rewrites rules (and communication algorithm) to use the ones from chemlambda?

When this project started some time ago, we had some vague proposal to use an actors model for "moving information around". That would be possible, in principle, because one would have an algorithm for each internal working of a "block" and another algorithm for communication (between actors=blocks). There would be a dynamical "actors diagram"=neighbours relation graph. On further thought, because all rules should be local, the actors diagram graph (or neighbours relations graph) should not be other than a theoretical explanatory object, not really needed in the model, except as it appears in the communication algorithm.

Back then, when chemlambda was called GLC, it was not even clear what the first algorithm is, that is why I wrote those long scripts. The second part, concerning the communication algorithm, was never written, my guess is that any communication algorithm would lead to something interesting.

Until your work, I have not seen anything constructive (not to say creative!) along these directions.

On Sat, May 7, 2016 at 4:05 AM, MaiaVictor notifications@github.com wrote:

I didn't see your last message, sorry. So, grab a popcorn, because I guess what I'll say will make you quite happy.

Remember I suggested the biggest practical issue with implementation of interaction combinators was that nodes that wire together become distant together in memory? That is, as you keep allocating/freeing nodes, the heap to become a pointer spaghetti. This takes the cache efficiency away, but makes the algorithm very difficult to parallelize, because GPUs are made for vectors, not dynamically shrinking graphs. So, while, in theory, we count the complexity of a λ-calculus term as the number of rewrite rules necessary to get to its normal form, in practice, the time it takes for the program to finish isn't within a constant of the number of rewrite rules.

LPU solves this problem. The idea is that the memory should be split in millions of very small (currently, 128 bits) blocks, each one with a very simple processing core, and capable of communicating with its neighbors. Each unit is responsible for 2 things: one, space management, erasing, duplicating and moving information around (to its neighbors); two, applying rewrite rules that compromise the logic of interaction combinators. The main hope is that, by following those rules, nodes that wire together will stay close together in memory, and, thus, they will be able to activate much faster.

You can see this illustrated on this file https://raw.githubusercontent.com/MaiaVictor/LPU/master/samples/pow_3_3_id.txt. It computes the normal form of the interaction net equivalent to the term ((λ f x . f (f (f x))) (λ f x . f (f (f x)))). Each column of 4 numbers represents an unit (processing core + 128bit memory), and each row is the whole memory in an instant in time. As you go down, you can see how the nodes travel through the memory and interact with each other. No global communication, all based on local rules, as each processing unit can only see its neighbors.

Now, one has to wonder if the added overhead - the movement rules - introduce an asymptotic slowdown of some kind. This statistics https://github.com/MaiaVictor/LPU/blob/master/stats.txt, which has a lot to reveal, suggests that no: the number of "global clocks" (i.e., every processing unit executing its loop once) appears to be to be within a constant factor of the number of rewrite rules.

That is both good and bad. It is bad, because one would expect that, in a massively parallel architecture, the number of global clocks necessary to reduce a net would be smaller than the number of total rewrite rules required to get to the normal form. Ideally, the more cores you add, the lower the "global clocks / rewrite rules" should be. That doesn't happen with LPU. I wonder if different logics could cause such an effect, but I have my doubts. It is not impossible that there is a hard limit imposed by the universe itself that dictates the amount of time required to get a net to the normal form is proportional to the amount of rewrite rules required to rewrite it. If that is true, 1. the number of rewrite rules is the perfect and final measure of the complexity of a term; 2. no amount of "parallel" computing can make it faster; 3. LPU is very close to perfect computer architecture. But I prefer to think that I just haven't found the right rules yet that will leverage the parallel capacities of interaction nets.

Nether less, this is good, because it shows that LPU is at least as good as the naive approach that delegates the whole allocating to the CPU, except it is honest and actually scales. We're so used to hear that CPU has O(1) indexing, but that isn't true. As the memory gets bigger, it gets further and further from the cores, and it takes some time to get the info there. That is why you can't have a 1 petabyte memory ram. It would be as slow as your HD. With LPU, that doesn't happen. You can keep stacking units horizontally from megabytes, to gigabytes, to tera, peta, exabytes, and the performance won't degrade. So, under the LPU architecture, the amount of time that it takes to reduce any interaction net (and, thus, compute any algorithm) becomes merely a function of two variables: 1. the amount of rewrite rules necessary to reduce this net (always a fixed value) and 2. how fast you manage to make the clock of a single LPU.

I hope this was informative. Let me know if you have any insights.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217595331

VictorTaelin commented 8 years ago
  1. Yes, the concept I have is that a computer would be a chain of millions of processing cores connected in a line. I "simulate" that in JavaScript by allocating an usual array and applying the kernel function sequentially to each core and its neighbors. Of course, that is slow, because the time it takes for a complete clock (activation of all cores) is the time it takes for javascript to execute all of them in order, whereas that could be simultaneous. LPU.c is currently 10x slower than Optlam, which is quite good given how much work it has to do to emulate a single clock sequentially. I have plans for implementing this in CUDA, which will run in GPUs with thousands of core. It is very likely that this will result in LPU being much faster than Optlam.
  2. It is just 2 simple rules.

    RULE A. PASSTHROUGH: The nodes are stored on memory like in your files: each node has a tag (32-bit Int) and 3 bonds (32-bit Ints), which is a name, like you suggested. Each name appears twice in memory, except with inverted signs. If a name positive (e.g., 30), that means its brother is to the right. If it is negative, that means its brother is to the left. The passthrough rules says that each node might move in the direction of its active bond. So, if two nodes are facing each other, but their active bond signs are 1 and -1, they swap. Remaining bonds are updated to make sure their signs remain consistent.

    RULE B. GRAVITATE: Take a look at this file. It lists the evolution of every binary unidimensional automata with rules mapping 3 to 3 bits. Search for the following one: [1,1,2,0,1,1]. Notice that it tends to move all "Xs" to the left, but also somehow magically keeps an empty space between each pair? That is exactly what we need to make sure there is always space for duplication. So, cores obey this automata to pass nodes around. That means that if a core is empty, the core to its left is empty, and the core to its right is full, it takes the node on its right. If a core is full, the core to its left is full, and the core to its right is empty, it moves its node to the right. That is all! This will automatically converge to a space configuration which always has free space to allocate a node wherever you want, while at the same time keeping nodes close together for them to interact.

    Finally, since my kernel can only rewrite 3 consecutive nodes, that is still not enough to make 4 nodes required by the duplication. So, instead, to apply the commute rule, it converts the two nodes into a temporary "please duplicate" node (signaled by a negative tag), and in the next tick another core duplicates it using the free space around it. But that is not necessary at all, your rewrite rule could range 4 nodes and that'd be it.

  3. Just edit lines 62-93 of LPU.js, but I'd really not suggest touching this file currently. It is not the definition of pretty. But in theory, you shouldn't, right? Since the interaction combinators can be used to emulate any interaction net, you should just encode your interaction net to interaction combinators, reduce, and then read back.

~

Let me know if you find any use for those ideas.

What would be an actor specifically? How does it map to / manifest in physical memory?

VictorTaelin commented 8 years ago

By the way, the thing that could be improved is the passthrough rule, since it is so simple. Bonds hold just the direction to the other bond, so that's as good as it gets. Now, if a bound could also hold the distance to the other, then we could use some math that dictates that nodes that are very far apart their active bond targets have higher "push" force than nodes that are close to their targets. I've ran same simulations and that solves one of the problems of LPU, which is that some nodes might get stuck far away from their targets, waiting nodes in-between to completely react.

The problem with that approach is: how do I keep the distance consistent? If a node far away moves to the left, you have to somehow update the distance you annotated. That'd involve a message. Now, if nodes needed to send messages every time they moved, the space would be so flooded with messages that it wouldn't be worth it. So, the question is: is it possible to improve the passthrough rule without flooding the space with messages?

chorasimilarity commented 8 years ago

Thanks, the rule B is an amazing idea. Btw, please excuse me for replying on this thread, I'm using mail and try not use my github password unless forced to do. If you want we could discuss via mail, or via other system you propose.

What you do is great! I'll comment it, but before let's try to settle a bit this discussion around lambda, or interaction nets and chemlambda. So in this post I have these very concrete questions:

  1. Take the S combinator. Can you point me to a drawing of the graph you use for this lambda term?
  2. I think you use "dup" for fanout node. There's only one type of fanout node in your sistem, right? Then, can you draw what happens when you reduce the graph of S where you add to the root a dup?
  3. I ask this because I believe (but I may be wrong) that you will not get two copies of the graph of S, instead you'll get one connected graph with two root nodes. Right or wrong? I believe you'll get something like this http://imar.ro/~mbuliga/s_combinator.html
  4. Or maybe you can't reduce S dup because it is not the graph of a lambda term? That is important too, notice that even if you start with a graph of a lambda term (say Omega), the rewrites transform it into graphs which are not graphs of lambda terms.
  5. In chemlambda it does not matter if the graph, aka molecule, is one coming from a lambda term or not. It may have 10 roots (these are FROUT nodes), or none. It may be connected or not, doesn't matter.
  6. In chemlambda the S dup graph reduces like this http://imar.ro/~mbuliga/s_c_combinator.html That is because there are two fanout nodes (one called FO and the other FOE).
  7. I believe (right or wrong) that your translation back from graphs to lambda terms (in normal form only?) does not discern between the graph from
  8. and the graph from 6.
  9. I noticed (on reddit, excuses for searching) that you mention somewhere that the graphs themselves may be more useful or more expressive than the lambda terms alone. That's what I believe about chemlambda, too. Among my examples are the Y combinator, which reduces to a 2 nodes graph which shoots pairs of other nodes, the "walker", which is simply a pattern of two nodes which propagate along a ladder of pairs(dup, application) nodes (it appears in the predecessor graphical reduction, in a form disguised into a pattern of approx 20 nodes which propagates). Simply, in lambda calculus alone there is nothing like this visible.

In the next reply about the management of space you use. But can we settle this paranthesis in some concrete way, would be great.

On Sat, May 7, 2016 at 3:10 PM, MaiaVictor notifications@github.com wrote:

By the way, the thing that could be improved is the passthrough rule, since it is so simple. Bonds hold just the direction to the other bond, so that's as good as it gets. Now, if a bound could also hold the distance to the other, then we could use some math that dictates that nodes that are very far apart their active bond targets have higher "push" force than nodes that are close to their targets. I've ran same simulations and that solves one of the problems of LPU, which is that some nodes might get stuck far away from their targets, waiting nodes in-between to completely react.

The problem with that approach is: how do I keep the distance consistent? If a node far away moves to the left, you have to somehow update the distance you annotated. That'd involve a message. Now, if nodes needed to send messages every time they moved, the space would be so flooded with messages that it wouldn't be worth it. So, the question is: is it possible to improve the passthrough rule without flooding the space with messages?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217631889

chorasimilarity commented 8 years ago

Practically, with your rules A and B, you invent a space and a physics for the graphs. In principle, any embedding of the graphs and the rewrite rules into some space with a physics will do it. Two examples: 1) real space, real physics, real molecules. 2) cartoon space, cartoon physics, database. I have to understand your rule B more and the physics. You do amazing stuff!

On Sat, May 7, 2016 at 2:40 PM, MaiaVictor notifications@github.com wrote:

1.

Yes, the concept I have is that a computer would be a chain of millions of processing cores connected in a line. I "simulate" that in JavaScript by allocating an usual array and applying the kernel function sequentially to each core and its neighbors. Of course, that is slow, because the time it takes for a complete clock (activation of all cores) is the time it takes for javascript to execute all of them in order, whereas that could be simultaneous. LPU.c https://gist.github.com/MaiaVictor/a38589cc94ec02f96f7a992aaeeab014 is currently 10x slower than Optlam, which is quite good given how much work it has to do to emulate a single clock sequentially. I have plans for implementing this in CUDA, which will run in GPUs with thousands of core. It is very probably that this will result in LPU being much faster than Optlam. 2.

It is just 2 simple rules.

RULE A. PASSTHROUGH: The nodes are stored on memory like in your files: each node has a tag (32-bit Int) and 3 bonds (32-bit Ints), which is a name, like you suggested. Each name appears twice in memory, except with inverted signs. If a name positive (e.g., 30), that means its brother is to the right. If it is negative, that means its brother is to the left. The passthrough rules says that each node might move in the direction of its active bond. So, if two nodes are facing each other, but their active bond signs are 1 and -1, they swap. Remaining bonds are updated to make sure their signs remain consistent.

RULE B. GRAVITATE: Take a look at this file https://raw.githubusercontent.com/MaiaVictor/LPU/master/count_preserving_automatas.txt. It lists the evolution of every binary unidimensional automata with rules mapping 3 to 3 bits. Search for the following one: [1,1,2,0,1,1]. Notice that it tends to move all "Xs" to the left, but also somehow magically keeps an empty space between each pair? That is exactly what we need to make sure there is always space for duplication. So, cores obey this automata to pass nodes around. That means that if a core is empty, the core to its left is empty, and the core to its right is full, it takes the node on its right. If a core is full, the core to its left is full, and the core to its right is empty, it moves its node to the right. That is all! This will automatically converge to a space configuration which always has free space to allocate a node wherever you want, while at the same time keeping nodes close togeth er for t hem to interact.

Finally, since my kernel can only rewrite 3 consecutive nodes, that is still not enough to make 4 nodes required by the duplication. So, instead, to apply the commute rule, it converts the two nodes into a temporary "please duplicate" node (signaled by a negative tag), and in the next tick another core duplicates it using the free space around it. But that is not necessary at all, your rewrite rule could range 4 nodes and that'd be it. 3.

Just edit lines 62-93 of LPU.js, but I'd really not suggest touching this file currently. It is not the definition of pretty. But in theory, you shouldn't, right? Since the interaction combinators can be used to emulate any interaction net, you should just encode your interaction net to interaction combinators, reduce, and them read back.

~

Let me know if you find any use for those ideas.

What would be an actor specifically? How does it map to / manifest in physical memory?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217630741

VictorTaelin commented 8 years ago

I don't have any problem discussing this here, other than someone coming in and patenting that stuff to sue me. A professor on my university was just commenting about how it almost destroyed the life of one of our professors, so it stuck on me. I don't fully understand how this system works or if that is possible, but it would be terrible if that happened. But this doesn't justify don't having this discussion in public, knowledge is there to be shared.

So, the very next thing I plan to do is writing a JS renderer that will animate the system in realtime, like the one you got (which is awesome by the way). Unfortunately, I'm with a lot of accumulated work on my job, so I'll be very busy on the next days Meanwhile, here is a drawing for you: http://i.imgur.com/qMjqnX6.jpg - I won't duplicate S by hand because that'd take a lot of time, but feel free to try. Don't forget each dup node has a label, so, the "dup" would duplicate the dup on S like it was a normal node. In the end, yes, you get two copies of S, and each copy will point to the auxiliar ports of the upper dup node.

I still don't fully understand what each of your nodes do comment on them. Currently, my concept is that it is like the system I'm using, except with port orientations and a few extra types, and that it somehow has better computational properties, allows the system to reduce λ-terms without tags, etc... but I still don't "get" it in the depth I'd like to. I.e., what is the purpose of each node? What makes it superior to interaction combinators? I'd like to hear your comments on that. Also, it would be really, really sweet to have an interactive site where I write a lambda term, it translates to a molecule and animates the reductions. Then in no time I'd be able to understand it by exploring the app.

Yes, we both do agree that the graphs are more useful than terms. I've seen your Y combinator. There are some mind blowing stuff going on with those graphs. If I recall correctly, somewhere you mentioned a "shooter" graph, which just keeps shooting copies of the same subgraph. Something very similar happens with interaction combinators.. I find it quite cool.

And yes again, the whole point of those rules is to embed the nodes in a physical space. Any embedding would do. I agree that real space/physics/molecules would be awesome. I have some concerns regarding molecules, though: aren't them too slow? Lets try a visualization and think about the limits of computation in our universe. Computation is all about moving information. So, in a way, the amount of time it takes to reduce a term is always proportional to space the information must travel in space. So, if we take nodes as the containers of information, then, that means that computing is all about moving nodes around. Now, imagine 3 systems: one, molecules represent nodes; on that system, computation is done as the molecules move through the space. Two, electrons represent nodes; on that system, computation is done as electrons move through atoms. Three, photons represent nodes; on that system, computation is done as photons move through space.

Currently, we have a mix of the two later. Molecules store the information about the nodes, but electrons communicate them to other molecules. It is terribly inefficient, though. Just think about it: somewhere in our CPUs, electrons are moving between memory, registers, ALUs, etc, and after a huge dance, they gather enough crossed information to get us to the answer. Now, think about the actual space the electron runs, and try comparing that with the minimal space that it could take. Under that view, it is easy to see CPUs are gigantic molecular labyrinths on which information, carried by electrons, must travel a much larger path than what would be theoretically necessary.

So, under that view lies some my concerns with chemical computing. The kind of molecular computer you describe looks to be very efficient space-wise. Molecules attract and repeal each other, so nodes that bond together are naturally always close together in space, and they kept reacting at an astonishing rate. So, differently from CPUs, the path information travels in space would be very optimized. Moreover, given how small atoms are and how many molecules you can fit in a volume of space, it is obvious that such computer would be ridiculously powerful. But, on the other hands, on your system, molecules don't just hold information, but they also are the vectors of communication. If you compare the speed molecules move to the speed of light, you can see that the velocity of propagation of information on that system isn't close to the theoretical limits. A statistics I'd like to know is: how much time does it take for a molecular reaction to happen, and how much space does light travel in that timeframe?

chorasimilarity commented 8 years ago

Perfect! It's the way I like.

I look forward to see the LPU (lambda processing unit?) done. No matter how you put it, the processors as they are done today have too many bottlenecks.

On Sat, May 7, 2016 at 6:03 PM, MaiaVictor notifications@github.com wrote:

I don't have any problem discussing this here, other than someone coming in and patenting that stuff to sue me. A professor on my university was just commenting about how it almost destroyed the life of one of our professors, so it stuck on me. I don't fully understand how this system works or if that is possible, but it would be terrible if that happened. But this doesn't justify don't having this discussion in public, knowledge is there to be shared.

So, the very next thing I plan to do is writing a JS renderer that will animate the system in realtime, like the one you got (which is awesome by the way). Unfortunately, I'm with a lot of accumulated work on my job, so I'll be very busy on the next days Meanwhile, here is a drawing for you: http://i.imgur.com/qMjqnX6.jpg - I won't duplicate S by hand because that'd take a lot of time, but feel free to try. Don't forget each dup node has a label, so, the "dup" would duplicate the dup on S like it was a normal node. In the end, yes, you get two copies of S, and each copy will point to the auxiliar ports of the upper dup node.

I still don't fully understand what each of your nodes to comment on them. Currently, my concept is that it is like the one I'm using, except with port orientations and a few extra nodes, and that it somehow has better computational properties, allows the system to reduce λ-terms without tags, etc... but I still don't "get" it in the depth I'd like to. I.e., what is the purpose of each node? What makes it superior to interaction combinators? I'd like to hear your comments on that. Also, it would also be really, really sweet to have an interactive site where I write a lambda term, it translates to a molecule and animates the reductions. Then in no time I'd be able to understand it by exploring the app.

Yes, we both do agree that the graphs are more useful than terms. I've seen your Y combinator. There are some mind blowing stuff going on with those graphs. If I recall correctly, somewhere you mentioned a "shooter" graph, which just keeps shooting copies of the same subgraph. Something very similar happens with interaction combinators. http://i.imgur.com/YIIIuf2.jpg. I find it quite cool.

And yes again, the whole point of those rules is to embed the nodes in a physical space. Any embedding would do. I agree that real space/physics/molecules would be awesome. I have some concerns regarding molecules, though: aren't them too slow? Lets try a visualization and think about the limits of computation in our universe. Computation is all about moving information. So, in a way, the amount of time it takes to reduce a term is always proportional to space the information must travel in space. So, if we take nodes as the containers of information, then, that means that computing is all about moving nodes around. Now, imagine 3 systems: one, molecules represent nodes; on that system, computation is done as the molecules move through the space. Two, electrons represent nodes; on that system, computation is done as electrons move through atoms. Three, photons represent nodes; on that system, computation is done as photons move through space.

Currently, we have a mix of the two later. Molecules store the information about the nodes, but electrons communicate them to other molecules. It is terribly inefficient, though. Just think about it: somewhere in our CPUs, electrons are moving between memory, registers, ALUs, etc, and after a huge dance, they gather enough crossed information to get us to the answer. Now, think about the actual space the electron runs, and try comparing that with the minimal space that it could take. Under that view, it is easy to see CPUs are gigantic molecular labyrinths on which information, carried by electrons, must travel a much larger path than what would be theoretically necessary.

So, under that view lies some my concerns with chemical computing. The kind of molecular computer you describe looks to be very efficient space-wise. Molecules attract and repeal each other, so nodes that bond together are naturally always close together in space, and they kept reacting at an astonishing rate. So, differently from CPUs, the path information travels in space would be very optimized. Moreover, given how small atoms are and how many molecules you can fit in a volume of space, it is obvious that such computer would be ridiculously powerful. But, on the other hands, on your system, molecules don't just hold information, but they also are the vectors of communication. If you compare the speed molecules move to the speed of light, you can see that the velocity of propagation of information on that system isn't close to the theoretical limits. A statistics I'd like to know is: how much time does it take for a molecular reaction to happen, and how much space does light travel in that timeframe?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217642982

VictorTaelin commented 8 years ago

The algorithm has nothing to do with the λ-calculus at all other than the initial inspiration. It is just a coincidence that some programs on the λ-calculus can be expressed easily as graphs, but I attribute that as the fact those programs are just the manifestation of some logic which can also be expressed on the interaction combinators. My understanding is that those graphs are the true form of logic, while the λ-calculus is just a tool humans are familiar with that can be used to express that logic.

There isn't much to understand about the tags. Think of it as: there is only one kind of node, but two possible rewrite rules (annihilate and commute). How do you decide which rule to apply? Based on the color of the node. Nodes with the same color annihilate, nodes with different colors commute. Initially I really disliked i,t because it doesn't seem as elegant as just having a finite set of "different" nodes, but after some thinking I noticed it actually makes sense. You can think of symmetric interaction combinators as this system, except limited to 2 colors. Other than that, they're exactly the same thing.

VictorTaelin commented 8 years ago

Ah by the way, your concept of an arrow is also used on LPU when two nodes annihilate. It creates a temporary node supposed to take the name of the joined wires to its new host. This is exactly like your fan-in move if I understand correctly.

chorasimilarity commented 8 years ago

If you don't use syntactic trees structure in your algorithm then how can you know what to throw away? I have this problem which I solve by rewrites involving a termination node. These are local, so it takes time. Finally I renounced using them and I don't throw anything. Also, if you don't use those trees structure then why do you limit to graphs coming from lambda terms?

Re the automaton from rule B, makes me wonder, for the fun of it: use other cellular automata (llike Game of Life) or even use your nodes and rewrites for the blocks (i.e. say that each block has 3 neighbours and a certain type, like, lambda, application, dup, ... see?).

Re the actors, I imagine them as colors of the nodes. The rewrites are enhanced with rules for the colors of new nodes as function of the colors of old nodes. I tried something like this, see https://plus.google.com/+MariusBuliga/posts/dyqb8Gbn5tJ

I would be curious to see something like: I have some of the blocks, you have others and we reduce the graphs together. It should work, because that's the point of an asynchronous graph rewrites automaton.

Also, say that I have a graph which has N free inputs and N free outputs, which reduces to a permutation of the inputs into the outputs. I publish half of the graph, which has all the outputs. You take it and you connect your graph to the outputs, then you reduce it privately and you send me the result. I get the reduced graph and (because I know the correspondence private-public of the edges which were cut before publishing my half) I glue my secret part and reduce it. I get your reduced graph. Does it make any sense?

On Sat, May 7, 2016 at 8:25 PM, MaiaVictor notifications@github.com wrote:

Ah by the way, your concept of an arrow is also used on LPU when two nodes annihilate. It creates a temporary node supposed to take the name of the joined wires to its new host.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217653409

VictorTaelin commented 8 years ago

If you don't use syntactic trees structure in your algorithm then how can you know what to throw away?

What do you mean?

Also, if you don't use those trees structure then why do you limit to graphs coming from lambda terms?

Don't understand too, in what sense I limit the graphs?

Re the automaton from rule B, makes me wonder, for the fun of it: use other cellular automata

There are many ways to experiment with it, we'll see!

Re the actors, I imagine them as colors of the nodes.

My brain crashed. I thought you meant actors in the sense of an high level representation of objects that send, receive messages, and that hold an internal state. Now you said atoms are colors of nodes and nothing makes sense anymore to me.

Also, say that I have a graph which has N free inputs and N free outputs, which reduces to a permutation of the inputs into the outputs. I publish half of the graph, which has all the outputs. You take it and you connect your graph to the outputs, then you reduce it privately and you send me the result. I get the reduced graph and (because I know the correspondence private-public of the edges which were cut before publishing my half) I glue my secret part and reduce it. I get your reduced graph. Does it make any sense?

So, to reduce a graph, you take many cuts of it, send to several different agents, each one reduces his own part, then you glue it together and reduce the result, right? If that is what you meant, yes, that is how I expected people would try to do it. I just think that this has way too many problems.

For one, what is an agent (physically speaking)? A CPU, for example? So, anytime you want to send a graph from agent A to B, you need to communicate it. That is a huge overhead already. And programming concurrent systems is the worst thing ever. Reasoning about them is the worst nightmare. So many things to consider. How do you cut the graph in the right places? You could make a cut that needs 1ms to reduce, another cut that needs 5 mins. How do you make cuts that ensure that work is divided fairly across actors? You can't predict how a subgraph will evolve. If you make an 1gb cut and send it to an agent, and he says 3s after: "hey it is done already", then you have just communicated 1gb for no useful work. Whenever a node gets its principal port pointed to an output for which an agent doesn't have the rest of the graph, then it can't do any useful work anymore. In worst case, all his active ports will try to space through the outputs and he will hold a useless graph. Then he will need to communicate it to some other agent. More communication overhead. In general, the whole scenario seems like a complete mess to me and I have no idea how I could make it work in practice. Just algorithmically speaking, implementing something like that would take several thousand lines of code, while the algorithm itself takes a few dozens! That is why I didn't really consider the idea. It is way too complicate to my buggy brain.

chorasimilarity commented 8 years ago

I got the impression that you only reduce graphs built from lambda terms. Is this false? Does your algorithm work for any initial graph which is made by a finite number of nodes? If yes, then why do you repeatedly try to reduce everything to graphs of lambda terms, when really the graph level does not impose this restriction? Ideology?

I tried to reproduce your i^i examples in chemlambda and it doesn't look especially fast. I thought that you use some clever way to decide the order of reductions based on the fact that you are reducing graphs constructed from lambda terms. Maybe I'm wrong, but I imagine that it is somehow related to the following problem I have in chemlambda: take as an example a graph build from a lambda term which contains the combinator K. Then, at some point during the reduction it may happen that one has a connected component of the graph (at that moment) which will going to die because it is linked to a termination node; a (perhaps big) number of rewrites related to the termination node will kill it eventually. I don't believe that there is a local response to the problem of deciding which piece of the graph is going to not influence the final result, unless the initial graph is from a small, well chosen family, like the graphs of some special (but still sufficiently large) family of lambda terms. Eventually I've chosen to not care about this.

Actors as colors of nodes is the same as coloring the nodes of a block with the same color, then rephrasing the graph rewrites and the akin to your rules A, B as just rewrites of colored nodes.

For the rest I agree with you, with the small distinction that without trying, how would one know if there is a sweetspot or not?

On Sat, May 7, 2016 at 11:56 PM, MaiaVictor notifications@github.com wrote:

If you don't use syntactic trees structure in your algorithm then how can you know what to throw away?

What do you mean?

Also, if you don't use those trees structure then why do you limit to graphs coming from lambda terms?

Don't understand too, in what sense I limit the graphs?

Re the automaton from rule B, makes me wonder, for the fun of it: use other cellular automata

There are many ways to experiment with it, we'll see!

Re the actors, I imagine them as colors of the nodes.

My brain crashed. I thought you meant actors in the sense of an high level representation of objects that send, receive messages, and that hold an internal state. Now you said atoms are colors of nodes and nothing makes sense anymore to me.

Also, say that I have a graph which has N free inputs and N free outputs, which reduces to a permutation of the inputs into the outputs. I publish half of the graph, which has all the outputs. You take it and you connect your graph to the outputs, then you reduce it privately and you send me the result. I get the reduced graph and (because I know the correspondence private-public of the edges which were cut before publishing my half) I glue my secret part and reduce it. I get your reduced graph. Does it make any sense?

So, to reduce a graph, you take many cuts of it, send to several different agents, each one reduces his own part, then you glue it together and reduce the result, right? If that is what you meant, yes, that is how I expected people would try to do it. I just think that this has way too many problems.

For one, what is an agent (physically speaking)? A CPU, for example? So, anytime you want to send a graph from agent A to B, you need to communicate it. That is a huge overhead already. And programming concurrent systems is the worst thing ever. Reasoning about them is the worst nightmare. So many things to consider. How do you cut the graph in the right places? You could make a cut that needs 1ms to reduce, another cut that needs 5 mins. How do you make cuts that ensure that work is divided fairly across actors? You can't predict how a subgraph will evolve. If you make an 1gb cut and send it to an agent, and he says 3s after: "hey it is done already", then you have just communicated 1gb for no useful work. Whenever a node gets its principal port pointed to an output for which an agent doesn't have the rest of the graph, then it can't do any useful work anymore. In worst case, all his active ports will try to space through the outputs and he will hold a useless graph. Then he will need to communicate it to some other agent. More communication overhead. In general, the whole scenario seems like a complete mess to me and I have no idea how I could make it work in practice. Just algorithmically speaking, implementing something like that would take several thousand lines of code, while the algorithm itself takes a few dozens! That is why I didn't really consider the idea. It is way too complicate to my buggy brain.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217668383

VictorTaelin commented 8 years ago

Does your algorithm work for any initial graph which is made by a finite number of nodes?

Yes!

If yes, then why do you repeatedly try to reduce everything to graphs of lambda terms, when really the graph level does not impose this restriction? Ideology?

No, of course not. It is just conveniency. I am perfectly fluent on the lambda calculus, whereas I couldn't write an adder directly on interaction combinators if my life depended on it. I also have Caramel, a way to write λ terms on which I'm very productive. I don't even how I could make a productive language to write those nets. It is within my near-future plans to explore that, though, and I'm really excited for that experience. I wonder if I can get to a mental state where I can make sense of a net by just looking at it, as with the λ-calculus. Who knows what new insights and programming techniques that could bring!

I think I understand your question now:

If you don't use syntactic trees structure in your algorithm then how can you know what to throw away?

I'll answer it below.

I thought that you use some clever way to decide the order of reductions based on the fact that you are reducing graphs constructed from lambda terms

Nooo, not at all! The λ-calculus is my personal keyboard, interaction nets are the English grammar. In no way the way English works has any relationship with my personal keyboard.

take as an example a graph build from a lambda term which contains the combinator K. Then, at some point during the reduction it may happen that one has a connected component of the graph (at that moment) which will going to die because it is linked to a termination node; a (perhaps big) number of rewrites related to the termination node will kill it eventually

True.

I don't believe that there is a local response to the problem of deciding which piece of the graph is going to not influence the final result

So, if I understand what you mean, Optlam does exactly that. By traveling nodes sequentially while following the path semantics, you will get to the normal form without ever reaching a node that won't affect the final result. So, basically, things are "thrown away" by the JavaScript garbage collector itself, because those nodes simply become unreachable. I'll explain how it works, but don't bother too much with it. First, it starts from a free wire. Then, it travels through the graph as follows: when you enter through an aux port, you annotate it on the node and follow its main port; when you enter through the main port, then, if the previous port was also the main port, you apply a rewrite rule, go back to the previous node, follow the annotated port to the node before, and enter the same port again. Otherwise, that node is part of the final graph, so you start a new walk starting from each of its aux ports. Keep doing that and you reach normal form. That is what the steps column mean on that file: the amount of times you visited a node. Notice, again, it has nothing to do with the λ-calculus. It is just the path semantics of interaction combinators. Is that what you asked?

I thought that you use some clever way to decide the order of reductions based on the fact that you are reducing graphs constructed from lambda terms.

Optlam has two reducers - one, lazy, which I described above. The other, strict and stupid: it just reduces every active port, no questions asked. For the case of i^i, the stupid reducer actually performs better than the lazy one, so reducing i^i isn't supposed to be slow. The term 50 ^ 50 takes only 12350 rewrites to get to its normal form. The stupid reducer performs about 300000 rewrites per second. Don't forget Optlam's system is optimal, so anything adding more nodes is likely to require more rewrites. Maybe chemlam has some kind of overhead for the reduction of λ terms, or maybe it is just an implementation issue. Could you count the number of rewrites?

For the rest I agree with you, with the small distinction that without trying, how would one know if there is a sweetspot or not?

I'm all for someone trying, it is just that, personally, I'd prefer to write a blockchain in Cobol than to write that stuff myself. :P Way above my skills.

Actors as colors of nodes is the same as coloring the nodes of a block with the same color, then rephrasing the graph rewrites and the akin to your rules A, B as just rewrites of colored nodes.

Ah, I think I get what you mean.

chorasimilarity commented 8 years ago

Re i^i then there's something I don't get right. Somewhere I'm making a big mistake. 50 has about 100 nodes as a graph, using the Church encoding. 50^50 is a graph of the same shape, but much bigger. A rewrite either deletes a pair of nodes or creates 4 nodes from 2. So, in order to create that big number of nodes, one has to have about the same order or rewrites, in this case. No matter which reduction strategy one follows. Where is the mistake I make?

On Sun, May 8, 2016 at 4:08 AM, MaiaVictor notifications@github.com wrote:

Does your algorithm work for any initial graph which is made by a finite number of nodes?

Yes!

If yes, then why do you repeatedly try to reduce everything to graphs of lambda terms, when really the graph level does not impose this restriction? Ideology?

No, of course not. It is just conveniency. I am perfectly fluent on the lambda calculus, whereas I couldn't write an adder directly on interaction combinators if my life depended on it. I also have Caramel, a way to write λ terms on which I'm very productive. I don't even how I could make a productive language to write those nets. It is within my near-future plans to explore that, though, and I'm really excited for that experience. I wonder if I can get to a mental state where I can make sense of a net by just looking at it, as with the λ-calculus. Who knows what new insights and programming techniques that could bring!

I think I understand your question now:

If you don't use syntactic trees structure in your algorithm then how can you know what to throw away?

I'll answer it below.

I thought that you use some clever way to decide the order of reductions based on the fact that you are reducing graphs constructed from lambda terms

Nooo, not at all! The λ-calculus is my personal keyboard, interaction nets are the English grammar. In no way the way English works has any relationship with my personal keyboard.

take as an example a graph build from a lambda term which contains the combinator K. Then, at some point during the reduction it may happen that one has a connected component of the graph (at that moment) which will going to die because it is linked to a termination node; a (perhaps big) number of rewrites related to the termination node will kill it eventually

True.

I don't believe that there is a local response to the problem of deciding which piece of the graph is going to not influence the final result

So, if I understand what you mean, Optlam does exactly that. By traveling nodes sequentially while following the path semantics - a purely local strategy - you will get to the normal form without ever reaching a node that won't affect the final result. That is what the steps https://raw.githubusercontent.com/MaiaVictor/LPU/master/stats.txt means on that file: the amount of times you visited a node. I'll explain how it works, but don't bother too much with it. First, it starts from a free wire. Then, it travels through the graph as follows: when you enter through an aux port, you annotate it on the node and follow its main port; when you enter through the main port, then, if the previous port was also the main port, you apply a rewrite rule, go back to the previous node, follow the annotated port to the node before, and enter the same port again. Otherwise, that node is part of the final graph, so you start a new walk starting from each of its aux ports. Notice, again, it has nothing to do with the λ-calculus. It is just the path semantics of interaction combinators. Is that what you asked?

I thought that you use some clever way to decide the order of reductions based on the fact that you are reducing graphs constructed from lambda terms.

Optlam has two reducers - one, lazy, which I described above. The other, strict and stupid: it just reduces every active port, no questions asked. For the case of i^i, the stupid reducer actually performs better than the lazy one, so reducing i^i isn't supposed to be slow. The term 50 ^ 50 takes only 12350 rewrites to get to its normal form. The stupid reducer performs about 300000 rewrites per second. Don't forget Optlam's system is optimal, so anything adding more nodes is likely to require more rewrites. Maybe chemlam has some kind of overhead for the reduction of λ terms, or maybe it is just an implementation issue. Could you count the number of rewrites?

For the rest I agree with you, with the small distinction that without trying, how would one know if there is a sweetspot or not?

I'm all for someone trying, it is just that, personally, I'd prefer to write a blockchain in Cobol than to write that stuff myself. :P Way beyond my skills.

Actors as colors of nodes is the same as coloring the nodes of a block with the same color, then rephrasing the graph rewrites and the akin to your rules A, B as just rewrites of colored nodes.

Ah, I think I get what you mean.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217681460

VictorTaelin commented 8 years ago

How exactly you go from church encoding to chemlam is something I'd like to know. Did you publish that algorithm anywhere? Nether less, that is really interesting. Let me know if you get any new insights on that issue.

chorasimilarity commented 8 years ago

The algorithm is the same as in GLC, see section 3 from http://www.complex-systems.com/abstracts/v22_i04_a01.html and look at the bottom of the page http://chorasimilarity.github.io/chemlambda-gui/dynamic/demos.html for bibliography. In GLC there are some rewrites which are global, i.e. global fanout and global pruning. Chemlambda eliminates these by introducing a second fanout node and some moves modifications (which do not modify the mentioned algorithm). Mind that I'm coming from other math fields, with different motivations. I also use this subject as a guinea pig for experiments with open science and self-validation, which maybe explains why I pointed several times to the Molecular computers "article" http://chorasimilarity.github.io/chemlambda-gui/dynamic/molecular.html . There are plenty of resources for understanding anything.

As I came to this subject via geometry, sometimes (in the things I wrote) I don't use the standard references or notations, but this not because I'm mean, I welcome any public (i.e. something accessible from a link) corrections and additions. On the other side, the real interest I have is not in lambda calculus, it is in the emergent algebras formalism (started in http://arxiv.org/abs/0907.1520 ), which is partially included in GLC and left aside in chemlambda.

If that's not too much for one message, recall that your rules A and B are external to lambda calculus (or interaction nets). The emergent algebra nodes would play the same role of space management as in your formalism, only that I have not arrived with the exposition there. (Every time I tried, I got stumbled in lambda calculus discussions.)

Does not matter, so where is my mistake? It's something trivial, I bet, unless you use somewhere some global fanouts and your algorithm (or hidden assumptions you make) hide them from view. So either I make a stupid mistake, please correct me then, or there is something interesting happening. Probably the first, I'm very curious.

On Mon, May 9, 2016 at 4:43 AM, MaiaVictor notifications@github.com wrote:

How exactly you go from church encoding to chemlam is something I'd like to know. Did you publish that algorithm anywhere? Nether less, that is really interesting. Let me know if you get any new insights on that matter.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-217760213

VictorTaelin commented 8 years ago

Just reading the first line of the "emergent algebras" paper crashes my brain. I wonder how any gets to that level of knowledge.

Does not matter, so where is my mistake?

I'm not sure if you are addressing this question to me or just stating it broadly?

chorasimilarity commented 8 years ago

Don't be modest: a) I see only elegant constructs in your work, b) pragmatism is a philosophical concept anyway, not less than others. The question is for you, although I admit that what I should do is to take your programs and answer myself. Let me start to rephrase the question, maybe you can give me a hint for the answer. You have a lambda term for computing a^a mod b, which is it? I prefer an untyped lambda term for this which avoids ifthenelse, or else somehow shorter or more clever. If I'm lazy, then hit me over the head with a link.

On Wed, May 11, 2016 at 2:47 PM, MaiaVictor notifications@github.com wrote:

Just reading the first line of the "emergent algebras" paper crashes my brain. I wonder how any gets to that level of knowledge.

Does not matter, so where is my mistake?

I'm not sure if you are addressing this question to me or just stating it broadly?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-218436165

VictorTaelin commented 8 years ago

Wait you're asking me for a term a^a mod b? I thought we were talking about a^a?

chorasimilarity commented 8 years ago

a^a is trivial, is just a applied to a (where both are in the Church encoding). This a^a term made me ask in the first time, then I thought that perhaps you are not computing a^a but maybe a^a mod b.

OK, so in the case of a^a, let's look at how a appears in chemlambda. There are two lambda nodes and then there are pairs of FO (fanout) and A (application). For example, 3 = Lx.(Ly.(x(x(xy)))), so there are two nodes L, three nodes A (application) and 2 nodes FO (fanout). A mol file expression for this term is: L 3 1 0, L 4 2 3, FO 1 5 6, FO 6 8 9, A 5 7 4, A 8 10 7, A 9 2 10, FROUT 0.

So the structure is a ladder of (a-1) pairs FO x y z, A y u v, with two L nodes at one end and a A node at the other end, such that there is a bond between one of the L nodes and the final A node. It looks like a rounded train track, roughly. a^a is either two copies of this graph, with the FROUT node removed and the bonds connected instead to an A node, or one copy of the graph of A, with the FROUT node removed and the bond connected to, say, FO 0 left right, A left right out, FROUT out. I'll take the first variant, for simplicity.

In chemlambda the reduction happens like this: after a first BETA reduction, one of the graphs of a is now connected to the "input" of the string of FO nodes which makes the ladder from the other copy of a. This graph of a starts to replicate (due to the rewrites involving the FO node) and, as soon as there are patterns of L and A fro the beta move, there are such reductions as well. But no matter how the reduction goes, a^a will finally be a graph with about 2 a^a nodes, which are going to be produced either by the replication process (where there are moves which I call DIST and you call commutative) which turn pairs of nodes (like L and FO, or A and FOE, or A and FO, or even FI and FOE, see later) into 4 nodes (net gain 2 nodes) or by the rewiring induced by the moves AL (i.e. beta move, which turns a pair A and L into two Arrow, etc) or FI-FOE (which turn a pair FI and FOE into two Arrow...). So no matter how I think about this, for producing a^a nodes (or twice, whatever) there are needed at least the same number of rewrites. I'm not counting the steps of the computation, but the rewrites needed. As you see, most of them involve moves which are for replicating pieces of graphs (DISt moves, or commutativity).

By contrast, a global fanout move would transform in only one rewrite the copy of a connected to the FO node into two copies of a. But this is unacceptable from my pov because such global fanouts are not local (there is no a priori upper bound on the number of nodes and bonds involved in this rewrite). THEN, HOW DO YOU DO IT FAST, WITHOUT GLOBAL FANOUT?

On Wed, May 11, 2016 at 4:21 PM, MaiaVictor notifications@github.com wrote:

Wait you're asking me for a term a^a mod b? I thought we were talking about a^a?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-218456844

VictorTaelin commented 8 years ago

I'm in a tight schedule on work so I might come back later, but just a quick and probably incorrect answer as I'm not even sure I understand what you are asking. Basically, you're saying that to reduce a^a (i.e., a applied to a), you need at least a^a rewrites, right? The normal form of a^a doesn't have a^a nodes afaik, just the log of it, or something like that. I think this has to do with the fact that many dup nodes with identical tags end up annihilating each other. If that is what you asked, I'd need more time to fully understand what happens when you reduce a^a.

chorasimilarity commented 8 years ago

I updated an older script, which allows deterministic reductions as well. If you go to the dynamic folder (in gh-pages!) and you take it, then you type bash moving_random_metabo.sh, then you choose a_exp_a.mol, you'll see in the file sta.txt the number of steps and rewrites. a_exp_a.mol is the chemlambda graph for 5 applied to 5 in Church encoding (there is no Lx.x at the end, as you have in your experiments). I see 35 steps and a total of 13102 rewrites. Compared to what I see at https://raw.githubusercontent.com/MaiaVictor/LPU/master/stats.txt for 5, namely 603 steps and 110 rewrites.

I repeat the same bash command and this time I choose 4_exp_4.mol ,then I look in the sta.txt and I see 24 steps and a total of 1175 rewrites, while in your file I read 325 steps and 68 rewrites.

So, what's happening here? In my conversion of lambda terms into graphs, which is trivial I believe, after the term is converted there is absolutely no other thing than the graph (made of a finite number of node types), there is no tag, nothing else. The rewrites either transform a pair of nodes into a rewiring of bonds, via the fleeting Arrow nodes, or they transform a pair into 4 nodes. The graphs for numbers as Church encoding terms have a number of nodes of the order of the numbers themselves, so no matter how I put it, by using this encoding and these rewrites one has to use a lot of rewrites to produce a graph which has about a^a nodes, from a graph which has about a nodes (modulo some multiplicative constants). On the other side, the number of steps may be small (i.e. there are many rewrites in parallel).

I would really like to understand.

There's no funding for the moment, but if you want we can write articles together. There is huge interest in chemlambda but for I reason I can't grasp, it has a high hallucinatory effect. I am a geometer who codes stuff because others start to behave as if they are on something very strong and eventually there are only words, words, words and no programs.

On Thu, May 12, 2016 at 7:34 PM, MaiaVictor notifications@github.com wrote:

But just a quick and probably incorrect answer as I'm not even sure I understand what you are asking. Basically, you're saying that to reduce a^a you need at least a^a rewrites, right? The normal form of a^a doesn't have a^a nodes afaik, just the log of it, or something like that. I think this has to do with the fact that many dup nodes with identical tags end up annihilating each other. If that is what you asked, I'd need more time to fully understand what happens when you reduce a^a.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-218813040

chorasimilarity commented 8 years ago

I added 6_exp_6.mol, which is for 6^6. With the same treatment as in the last comment, I see in sta.txt 48 steps and 183487 rewrites. Mind that the script main goal is to produce a html output, in this case a huge 6_exp_6.html with 968,7 MB.

On Thu, May 12, 2016 at 11:54 PM, Marius Buliga marius.buliga@gmail.com wrote:

I updated an older script, which allows deterministic reductions as well. If you go to the dynamic folder (in gh-pages!) and you take it, then you type bash moving_random_metabo.sh, then you choose a_exp_a.mol, you'll see in the file sta.txt the number of steps and rewrites. a_exp_a.mol is the chemlambda graph for 5 applied to 5 in Church encoding (there is no Lx.x at the end, as you have in your experiments). I see 35 steps and a total of 13102 rewrites. Compared to what I see at https://raw.githubusercontent.com/MaiaVictor/LPU/master/stats.txt for 5, namely 603 steps and 110 rewrites.

I repeat the same bash command and this time I choose 4_exp_4.mol ,then I look in the sta.txt and I see 24 steps and a total of 1175 rewrites, while in your file I read 325 steps and 68 rewrites.

So, what's happening here? In my conversion of lambda terms into graphs, which is trivial I believe, after the term is converted there is absolutely no other thing than the graph (made of a finite number of node types), there is no tag, nothing else. The rewrites either transform a pair of nodes into a rewiring of bonds, via the fleeting Arrow nodes, or they transform a pair into 4 nodes. The graphs for numbers as Church encoding terms have a number of nodes of the order of the numbers themselves, so no matter how I put it, by using this encoding and these rewrites one has to use a lot of rewrites to produce a graph which has about a^a nodes, from a graph which has about a nodes (modulo some multiplicative constants). On the other side, the number of steps may be small (i.e. there are many rewrites in parallel).

I would really like to understand.

There's no funding for the moment, but if you want we can write articles together. There is huge interest in chemlambda but for I reason I can't grasp, it has a high hallucinatory effect. I am a geometer who codes stuff because others start to behave as if they are on something very strong and eventually there are only words, words, words and no programs.

On Thu, May 12, 2016 at 7:34 PM, MaiaVictor notifications@github.com wrote:

But just a quick and probably incorrect answer as I'm not even sure I understand what you are asking. Basically, you're saying that to reduce a^a you need at least a^a rewrites, right? The normal form of a^a doesn't have a^a nodes afaik, just the log of it, or something like that. I think this has to do with the fact that many dup nodes with identical tags end up annihilating each other. If that is what you asked, I'd need more time to fully understand what happens when you reduce a^a.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-218813040

chorasimilarity commented 8 years ago

In the reference http://arxiv.org/pdf/0906.0380v3.pdf about symmetric interaction combinators I read at p. 15: "A necessary condition for having a fixpoint combinator is the ability of duplicating any term. In the symmetric combinators, we are only able to duplicate cut-free nets as in Lemma 1.11, so we do not have a fixpoint combinator at our disposal."

In chemlambda I'm able to duplicate any graph coming from an untyped lambda term.

I think that most of the discussion evolves around this and around the functions encode and decode (from lambda calculus to graphs), which are not one the inverse of the other, as I mentioned in a previous comment.

On Fri, May 13, 2016 at 12:16 AM, Marius Buliga marius.buliga@gmail.com wrote:

I added 6_exp_6.mol, which is for 6^6. With the same treatment as in the last comment, I see in sta.txt 48 steps and 183487 rewrites. Mind that the script main goal is to produce a html output, in this case a huge 6_exp_6.html with 968,7 MB.

On Thu, May 12, 2016 at 11:54 PM, Marius Buliga marius.buliga@gmail.com wrote:

I updated an older script, which allows deterministic reductions as well. If you go to the dynamic folder (in gh-pages!) and you take it, then you type bash moving_random_metabo.sh, then you choose a_exp_a.mol, you'll see in the file sta.txt the number of steps and rewrites. a_exp_a.mol is the chemlambda graph for 5 applied to 5 in Church encoding (there is no Lx.x at the end, as you have in your experiments). I see 35 steps and a total of 13102 rewrites. Compared to what I see at https://raw.githubusercontent.com/MaiaVictor/LPU/master/stats.txt for 5, namely 603 steps and 110 rewrites.

I repeat the same bash command and this time I choose 4_exp_4.mol ,then I look in the sta.txt and I see 24 steps and a total of 1175 rewrites, while in your file I read 325 steps and 68 rewrites.

So, what's happening here? In my conversion of lambda terms into graphs, which is trivial I believe, after the term is converted there is absolutely no other thing than the graph (made of a finite number of node types), there is no tag, nothing else. The rewrites either transform a pair of nodes into a rewiring of bonds, via the fleeting Arrow nodes, or they transform a pair into 4 nodes. The graphs for numbers as Church encoding terms have a number of nodes of the order of the numbers themselves, so no matter how I put it, by using this encoding and these rewrites one has to use a lot of rewrites to produce a graph which has about a^a nodes, from a graph which has about a nodes (modulo some multiplicative constants). On the other side, the number of steps may be small (i.e. there are many rewrites in parallel).

I would really like to understand.

There's no funding for the moment, but if you want we can write articles together. There is huge interest in chemlambda but for I reason I can't grasp, it has a high hallucinatory effect. I am a geometer who codes stuff because others start to behave as if they are on something very strong and eventually there are only words, words, words and no programs.

On Thu, May 12, 2016 at 7:34 PM, MaiaVictor notifications@github.com wrote:

But just a quick and probably incorrect answer as I'm not even sure I understand what you are asking. Basically, you're saying that to reduce a^a you need at least a^a rewrites, right? The normal form of a^a doesn't have a^a nodes afaik, just the log of it, or something like that. I think this has to do with the fact that many dup nodes with identical tags end up annihilating each other. If that is what you asked, I'd need more time to fully understand what happens when you reduce a^a.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-218813040

chorasimilarity commented 8 years ago

As for the 4^4 computation in chemlambda, I mean the computation itself, not only some stats about it: see https://plus.google.com/u/0/+MariusBuliga/posts/h3ukHDmv3s6 . What I find interesting:

VictorTaelin commented 8 years ago

Looking for time to grasp it all :(

thealexvarga commented 8 years ago

Hi, I've only just discovered chemlambda, so I only understood a fraction of this conversation, but it is all incredibly interesting. I've always been casually interested in "localized" computation, as in cellular automata, and have pondered massively parallel hardware implementations like your LPU. If you decide to move this conversation somewhere less public, I'd love to be included so I can try to keep following it. Thanks!

chorasimilarity commented 8 years ago

Hi, if gmail is OK then you can retrieve mine from the chemlambda collection https://plus.google.com/u/0/collection/UjgbX

On Wed, May 25, 2016 at 8:53 PM, thealexvarga notifications@github.com wrote:

Hi, I've only just discovered chemlambda, so I only understood a fraction of this conversation, but it is all incredibly interesting. I've always been casually interested in "localized" computation, as in cellular automata, and have pondered massively parallel hardware implementations like your LPU. If you decide to move this conversation somewhere less public, I'd love to be included so I can try to keep following it. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-221652818

VictorTaelin commented 8 years ago

Uhm why less public, though? Insightful information should never be kept private.

Btw I'm completely overloaded with work & study, but I'll be coming back to this problem/thread as soon as I can. I have new insights.

chorasimilarity commented 8 years ago

I believe the same. First rule is: give.

On Sun, May 29, 2016 at 11:31 PM, MaiaVictor notifications@github.com wrote:

Uhm why less public, though? Insightful information should never be kept private.

Btw I'm completely overloaded with work & study, but I'll be coming back to this problem/thread as soon as I can.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chorasimilarity/chemlambda-gui/issues/2#issuecomment-222381109, or mute the thread https://github.com/notifications/unsubscribe/AIDOW5VrOwa6blOnDk1AlyZ0mJG5A9Loks5qGfe0gaJpZM4GEaqA .

VictorTaelin commented 8 years ago

@chorasimilarity you messaged me those days about a paper but I can't find where?

chorasimilarity commented 8 years ago

https://github.com/MaiaVictor/optlam/issues/1#issuecomment-226567022

chorasimilarity commented 7 years ago

Busy times, but have you seen this haskell project? https://github.com/synergistics/chemlambda-hask

On Mon, Jun 20, 2016 at 12:02 AM, Marius Buliga marius.buliga@gmail.com wrote:

https://github.com/MaiaVictor/optlam/issues/1#issuecomment-226567022

synergistics commented 7 years ago

@MaiaVictor, you seem to know more about Haskell than I; I'm still a noob. If you're up for it, please critique my project in any way you can. Maybe we could work together on it.

Also, I didn't realize this huge conversation was here! I've got a ton to learn.

VictorTaelin commented 7 years ago

There is just too much discussion going on for me to follow, I'm really busy with tons of projects :( the code looks beautiful, though, you certainly doesn't look like a noob, nor someone who knows less than me.

chorasimilarity commented 3 years ago

The official page of all chemlambda projects is https://chemlambda.github.io/index.html

The repository chemlambda-gui is kept because of the content, otherwise, for new experiments, go to the official page.