MatrixAI / Architect

Programming Language for Type Safe Composition of Distributed Infrastructure
Apache License 2.0
1 stars 0 forks source link

Composition of Automatons #9

Open CMCDragonkai opened 6 years ago

CMCDragonkai commented 6 years ago

A core value proposition of the Architect language is the ability to compose Automatons together in a type safe manner.

Here we consider the different kinds of composition.

  1. Object Composition
  2. "Functional" Composition (or Arrow Composition)
  3. Union Composition

Object Composition

The Automaton Specification https://github.com/MatrixAI/Architect/issues/3 already shows this via its dependency specification.

The name "object composition" derives from https://en.wikipedia.org/wiki/Composition_over_inheritance and https://en.wikipedia.org/wiki/Object_composition pattern in OOP. Objects in OOP are far more than just data abstraction. They encode encapsulation and bundled methods. In the same way, Automatons are far more than just functions or data structures. They can be thought of a highly complex data type. And Automatons here more closely resemble the Alan Kay definition of what an object is.

Composition means injecting dependencies into the closure of an Automaton.

Another important concept deriving from OOP is aggregation. The injection of dependencies as we have demonstrated in the Automaton Specification does not imply any kind of ownership between the enclosing Automaton and the enclosed Automaton. In OOP speak, this means destroying the parent object doesn't automatically delete the child object. However this could happen if the language has automatic garbage collection, and realised that there are no other live references to the child object.

There's a discussion of garbage collection in a distributed system: http://wiki.c2.com/?DistributedGarbageCollection

Automatons don't own other Automatons. However if we take the idea of an "entrypoint" as a sort of live reference. Then if the enclosing Automaton is no longer "needed", then the enclosed Automaton may also not be needed (unless there were being used by other Automatons).

We need to make this concept "need" more clearer. Whether we use something like reference counting or other GC mechanisms. One must explore this within the realm of distributed systems.

Another aspect to consider is process trees. In Erlang, it is possible for processes to launch child processes and resemble a supervisor of those processes. If we take this model to Matrix, it resembles a sort of recursion of Orchestration. Where Orchestrators could run Orchestrators (and the Orchestrator can be as sort of interface that can be satisfied by alternative implementations). In Unix systems. process trees don't automatically become supervisor trees. Instead of the parent process dies, the child process just becomes an orphan.

As a future extension, one should allow "supervisor" Automatons that can launch sub-Automatons. In such case, we would need to change our Automaton specification to be recursive. Note that this doesn't have anything to do with Automatons launching child processes. It is totally possible for the internal application to do fork and exec to create child processes.

"Functional" Composition

Functional composition of Automatons was first introduced here: https://matrix.ai/2014/12/05/distributed-lens-architecture/ Functions here is just used as a closely related concept. Automatons are not true functions (although sometimes they can be thought of as functions depending on the context).

The basic idea is that the output of an Automaton can be fed to input another Automaton. Similar concepts with process piping processA | processB in Unix shell languages.

However Automatons are pretty complex. The protocols they speak may or may not be unidirectional byte streams. Automatons may be receiving messages from multiple external Automatons, and also sending out messages to multiple dependencies.

Before going further, we must understand different protocol behaviours.

Protocols can exist in a request response cycle. Where after a message is sent, there is an expectation that a message must be received. For example an HTTP request and an HTTP response. Protocols can also exist in a way where messages are sent with no expectation of a response. It is also possible that there is no notion of message framing, so the protocol is just a byte stream.

Functional composition is therefore a way of getting 2 or more Automatons together to behave as if it were one Automaton. This order of composition here is important. Our functional composition operator is not associative. The end result is a construct that can be used everywhere where an Automaton can be used. Messages sent to this "composed" Automaton are sent to the first Automaton. Messages received out of this "composed" Automaton are received form the last Automaton. The only restriction is that the output protocol of the previous Automaton lines up with the input protocol of the next Automaton.

Not every protocol is conducive to "functional" composition. For example HTTP's input and output is not symmetric. One cannot take 2 HTTP speaking Automatons and compose them in this way. But one could instead imagine an middleware protocol (such as that used in NodeJS's connect middleware or PHP's HTTPKernel) which allow Automatons to take HTTP input and output HTTP input, and Automatons that take HTTP output and output HTTP output.

One result of this is that a client Automaton may send messages to one network destination, but actually receive its result from a completely different Automaton. However this network behaviour is transparent to the client Automaton. Because our functional operators are a sort of network combinator that can create "virtual" Automaton that represents the composition of multiple Automatons. Thus there is only 1 enviroment variable address that is substituted into the internal application artifact.

Union Composition

@kneedler Came up with this idea which is pretty cool. This is basically the API gateway pattern. http://microservices.io/patterns/apigateway.html

Basically multiple Automatons may present compatible protocol specifications. One could create a union of Automatons (the union operator is associative), which itself represents a single "virtual" Automaton.

This allows one to decompose an interface into multiple Automatons that can be written in different languages and have different scalability requirements, but present a single address to access all these services.

There will be some complexities in this. The composition of these Automatons must mean their protocols are compatible. This means there are no ambiguities between the potential routes that a message can take. Consider 2 HTTP Automatons which both present the /songs route. This would be ambiguous and such a composition must be rejected. Alternatively one cannot compose an HTTP Automaton with a UDP Automaton. They speak protocols that just cannot be properly combined and routed.

The result of union composition is a sort of "routing" Automaton. How the routing occurs depends on the protocol being unioned. If we are unioning HTTP, the "virtual" Automaton could be implemented via an HTTP load balancer like NGINX or HAProxy.

Virtual Automatons or Network Combinators

The implementation of functional composition and union composition could be implemented below the Automaton level, or at the Automaton level.

At the Automaton level, this means a "virtual" Automaton is created. This could mean there's a real container being deployed that does the routing the case of Union composition or redirecting input and output in the case of Functional composition. How will these Automatons be considered by the Orchestrator? Are these Automatons specified as primitives in the language, or are they part of the userspace standard library? If they are primitives, it would mean that we have to select a particular implementation for routing. Like choosing NGINX or HAProxy.

If they were implemented at a lower level, this adds a lot of workload, now we have to rewrite protocol-aware routers. In this case they would not be deployed as Automatons, but instead are part of the Relay backbone infrastructure.

Someone should expand on this issue. I'm leaning more on the idea of having a "virtual" Automaton. But that still needs us to figure out whether the composition operators are user-defined or primitives of the language.

Abstract vs Concrete

One important thing to realise, is that at the Architect language, composition of Automatons is in the abstract. It does not tell us how the live Automatons are actually connected, or how many live Automatons there are running or where the Automatons are placed in the network of Matrix Nodes.

This is intentional. It is up to Orchestrator to figure out those questions. The Architect language represents a high level declaration of semantics and performance requirements. The Orchestrator may decide to autoscale Automaton B to satisfy the performance requirements that A when it is communicating to B. Note that A may have its own quality of service criteria that fed into B.

Do not confuse the abstract composition, with live Automaton connections. Composition is within the Architect language, connections are live.

Implementation

Composition has a strong relationship with the Protocol Specification. People working on the Protocol specification needs to consider the 3 above use cases. https://github.com/MatrixAI/Architect/issues/6

ghost commented 6 years ago

To be clear, I think these different names for composition is about classifying the ways that two or more automatons can interact.

I think some of these ideas about the way that automatons compose have corresponding notions in process algebra. Automatons are like processes, union composition is like alternative choice (A+B), functional composition is a sequence of actions (A = a.b.c), and object composition is like executing processes in parallel (A|B) with a communication function between A and B.

@olligobber Because there is a relation between session types and process algebra, we can use process algebra to more concretely specify what we mean by these compositions?

olligobber commented 6 years ago

A demo of Union Composition on Session Types has been added in 4574e0e94076c5baba892efbe6ebd8280f3266ac. There are two versions, strictUnion will only work if the protocols have no overlaps, where union will work as long as the overlapping parts of the protocol can be union composed. Since the functions return Nothing if the protocols cannot be merged, they can be used for checking if two Automata can be Union Composed.

CMCDragonkai commented 6 years ago

An Automaton can only communicate with other Automatons only if those Automatons have been explicitly composed together in our Architect language. This explicit composition can be thought of as sharing a capability (in the capability-security sense) to communicate with an Automaton to another Automaton. This is achieved solely through our Architect language and our Relay network. For the code that is running inside an Automaton, they may see this implemented as an environment variables which has values that represent protocol-specific addresses. For example if an Automaton expects an HTTP address, the code within will expect an environment variable containing a string of the form https://foo.bar. The exact value of this string doesn't matter, and even if another Automaton were to forge such a string, they would not be able to communicate to it without it being explicitly specified in the Architect expression.

This achieves the principle of least privilege.

We must make these "capabilities" unforgeable.