MatrixAI / Emergence

Distributed Infrastructure Orchestration
Apache License 2.0
1 stars 0 forks source link

Architect Expressions #16

Open mokuki082 opened 6 years ago

mokuki082 commented 6 years ago

Content addressing, interface vs instance, class-based OOP, IoC

mokuki082 commented 6 years ago

UPDATE:

  1. Changed having to specify uid:gid to simply root:root
  2. Changed exposed port to given information via environment variables.
  3. Added Nix Store path in the user story on sharable layers.
  4. Added a user story on liveness and readiness probe

Artifact Expression User Stories

Writing Artifact Expression

As an operator, I should not need to specify the uid/gid of the entrypoint program. It is assumed that the entrypoint program will run as root:root.

As an operator, I should not hard code the exposed ports and addresses of a container. These information should be given to the application at runtime via environment variables, as they may change dynamically through the lifetime of an automaton.

As an operator, I must be able to supply a list of environment variables in the format VARNAME=VARVALUE when creating the artifact. These values must be merged with any others supplied in the creation of the container.

As an operator, I must be able to supply a list of arguments to execute when the container starts (the entrypoint). These values act as defaults and MAY be replaced by an entry point when creating a container.

As an operator, I must be able to supply default arguments to the entrypoint (e.g. CMD Dockerfile instruction), which will be executed when the container starts if the entrypoint is not specified.

As an operator, I must be able to supply a current working directory of the entrypoint in the container. This value SHOULD act as a default and MAY be replaced by a working directory specified when creating the container.

As an operator, I must be able to supply labels for the container using the annotation rules.

As an operator, I must be able to specify a system call signal that will be sent to the container to exit. The signal can be a signal name such as SIGKILL or SIGRTMIN+3.

As an operator, I should not specify the persistent storage mount points (similar to Docker Volumes) in the Artifact spec, instead this information should be present in the State spec.

As an operator, I must be able to build an artifact from another artifact fetched from a remote or local registry transparently via a content address to the artifact spec. The content address of an artifact must be unique and deterministic (any changes to the content should generate another pseudorandom content address), the content address MUST be supplied through a trusted source.

As an operator, I must be able to specify imperative instructions to add layers to an artifact. These imperative instructions should be translated into deterministic output (i.e. an OCI image) before the artifact is fetched and used by another operator.

As an operator, I want matrix artifacts to comply with the OCI image standards so we can substitute any OCI compliant runtimes as we wish.

As an operator, I want to be able to delete references to created artifact specs using imperative commands such as:

A = Artifact {...}
del A

This command should delete the references to the top artifact layer, manifests, and configuration. However the underlying layers may still be utilised by other artifacts. Hence they should remain there until the garbage collector collects them.

As an operator, I would like to enable liveness and readiness probe on an artifact. If an automaton is not responding to valid periodic requests, it should be killed and restarted. If an automaton is not ready, no traffic should be directed to the automaton.

Sharing Artifact expressions

As an operator, I want to be able to push an artifact to a shared registry so all other operators in the network will be able to pull the artifact via a content address. This may be done through sharing content addresses of artifact specs in real time.

As an operator, I want to have copies of the actual artifact (layer tar archive, manifest, configuration, etc.) when I want to test the artifact locally.

As an operator, I want the read only components of an artifact to be shared across multiple artifacts (for example, a NixOS artifact shares some base layers with an Alpine artifact), this should be done through nix store graphs to reduce storage space.

CMCDragonkai commented 6 years ago

Just regarding uid and gid, I think these should be automatically specified. The operator should not worry what user the container is running as. In fact, I would think each instance of an Automaton can run in its own user. It just ensure more isolation. So maybe we can ignore anything inside /etc/passwd?

CMCDragonkai commented 6 years ago

A single container can bind to several network interfaces/ports. The main designation for what a container speaks is the protocol spec. But right now the protocol spec doesn't specify address details. Like an HTTP protocol spec right now has no mention of the HTTP address. This is by design, the operator should need to create an artificial address, all these can be automatically and determiniatically derived. But how? Well via "pushing the config down". Basically the inside apps of the container needs to be given parameters (addresses) to bind to. This is better than the internal apps binding to a fixed address and us having to remap it.

We can use environment variables to achieve this just like anything else. So in effect both my own address and the addresses of my deps is handed down to me. @ramwan can you weigh in here?

The internal address however doesn't matter because we can always remap. However, because we don't know (from the orchestrators pov) what port or interface they bound to, we would either need to discover that info (via container metadata) or ask the operator in the artifact config. Also do containers have 127.0.0.1 by default?

Note that this means the EXPOSE option would be overwritten if configurable or it would be used as discoverable metadata to know what ports to remap.

CMCDragonkai commented 6 years ago

Regarding uids/gids, processes inside containers should just run as root:root. This may be security issues later though, but for now it should be fine.

ramwan commented 6 years ago

@CMCDragonkai Environment variables are one way to do push down configuration values. Another possible way to do it could be for apps to 'query' addresses via mechanisms such as DNS. I think that environment variables are a good way for apps to find out what interfaces to bind to (testing and experimentation needs to be done) but may not be suitable for communicating with other automatons (again, testing and experimenting need to be done and will heavily depend on how communication context is established between automatons).

CMCDragonkai commented 6 years ago

I think we talked about this before, but DNS is not a good idea due to cache invalidation. We need deeper control over name resolution so yes it does need to be dynamic, but the problem is with synchronising dynamic changes atomically or in an eventually consistent manner across the Matrix network.

Some additional thoughts. It is important to not think of DNS in terms of its implementation, but in terms of its core abstraction as a key value database adding a level indirection to pointer dereferencing. DNS through its hierarchical structure and TTLs is eventually consistent database with a single source of truth: 1 writer, many readers. The main issue is with change, as we change name resolution either due to migration, scaling, redeployment or other things, these changes must propagate to how we resolve names (especially to avoid "recompiling" the entire Matrix network). If our network is small, changing from a single database is adequate, however as we scale up, changing names can involve global locks or significant downtime or significant name resolution overhead.

How do we have consistent systems that is highly available is a classic distributed systems problem. One solution is to consider both time and space partitioning for an eventually consistent system.

CMCDragonkai commented 6 years ago

Sharing artifacts should be based on Nix graph based sharing. Container based layer sharing is a flawed version of this. But I believe layers should be mappable to nix store graph.

To do this appropriately, we need to consider content storage system for our artifacts. Let's not reinvent the wheel, the Nix system is a great way of building artifacts. However this means integrating the /nix/store model into our system. Right now there's nothing concrete that turns the /nix/store into something that can be decentralised or even just distributed. There are some notions of usinf nfs, ipfs, but I suspect there may be some tradeoffs here. This is a classic distributed systems problem and we'll need to consider the consistency tradeoffs here.

In the Nix system there is a concept of multistage evaluation due to mainly to the requirements of determinism. At the top level we have the Nix expressions which is a human readable turing complete language that allows areas of side-effects to occur specifically for the convenience of working within a filesystem context. That is Nix expressions can refer to other entities on the filesystem and even have limited abilities to perform IO like reading environment variables. The next stage is the derivation, where the expressions can compiled to a limited configuration language that is not turing complete (ATerm), but ultimately is a graph like data structure. All reducible expressions should be reduced at this point, there's no further evaluation at the Aterm stage. The key point is that everything is fully specified (any IO that the Nix expression language knows about is fully read and completed). There is a relationship between an in-memory interpretation of a Nix derivation expression and the derivativion itself that exists on disk (a kind of hybrid memory and disk model of state), thus the existence of the derivation file seems to be a side effect of evaluating the Nix expression, yet the side effect is transparent to the interpretation, so its a hidden side effect and everything is still pure. Finally there is the execution of the derivation that builds a final artifact. While all legitimate derivation expressions will always be compilable to a derivation file, not all derivation files will produce and output store path. The derivation might not work for any reason.

So our Artifacts are the actual outputs of any derivation, whole the Artifact specification is some sort of translation of the derivation expression or derivation itself. Note that the transformations are one-way. If you have the derivation expression, you ca transform to the derivation, but not the other way around.

Now for the sharing of artifacts. By flattening the hierarchy of key values here, from fixed output derivativations, to derivations and outputs, everything from source to remote deps to outputs is put in a global content addresses /nix/store. Suppose we make the /nix/store decentralised, the only thing we can share are derivations and outputs. For the Architect language, I propose one step further, make the expressions themselves shareable. That includes the Artifact specification expression.

Note we easily support other artifact specification formats easily by piggy backing off Nix. Docker formats are understood via just a conversion like Docker2Nix.

CMCDragonkai commented 6 years ago

We also need readiness/liveness probes included as well. In my HTTP applications, I generally add an /ping endpoint to act as a liveness/readiness endpoint. If the route returns a 2XX status code (given a HEAD request), that means the service is ready to receive requests. Any initial startup sequence should have been done by that time. However a more generic notion of readiness could be derived from the protocol specification. To do such a thing a default route must be chosen, and a default idempotent message type must be available. So for HTTP that could just be HEAD to the root /. But for other protocols , they may have less choices and variations such as TCP it might just be a sync packet.

mokuki082 commented 6 years ago

So for sharing artifacts using Nix, will the operators be writing Nix expressions to bring in the docker images using a function created by us, which would be similar to something like fetchDocker but with layers as the build dependencies rather than just having the entire image blob as the output?

mokuki082 commented 6 years ago

As for whether a container have 127.0.0.1 by default or not, the OCI runtime spec does not mention this topic. If we create a new network namespace with ip netns it comes with a loopback interface with no address assigned. However, Docker seems to have 127.0.0.1 by default because running a alpine box with --network none still gives me the loopback interface with 127.0.0.1.