Discussion of nix-prefetch-ipfs

We briefly discussed a nix-expression rewrite system similar to that of gx package manager for go. What it would do is rewrite source addresses (sources that are required to build the nix package) from a standard url to an ipfs url where that content is hosted. For example: "https://git.kernel.org/torvalds/t/linux-4.15-rc8.tar.gz" -> "ipfs/QmeF59wRCygGMrEJbLdYS1CyJmA2XowaR9oRX2VJty7nCR", which is a content address of the tarball using the multihash scheme" ipfs itself provides a specific facility to store tar compressed archives using ipfs tar add, which parses tarballs into a merkle dag structure. Not sure what goes on behind the scenes to parse this tarball into merkle dag structure.

The nix-prefetch-ipfs would thus do the following:

download source from URL
check if local daemon running
otherwise use remote ipfs gateway
upload source to ipfs, getting back the content address
- return package in local nix store, and the content address of the package in the ipfs cloud

More about Forge Package Archiving

There are two things in nixos that can be pushed to ipfs: build-outputs (binary cache compiled packages) and build-inputs (used to build the packages using a nix expression). Forge Package Archiving looks to move build-inputs into ipfs.

HydraCI, TravisCI

We need to look at systems that do continuous integration so we can see how we can successfully propogate updates to automatons in the Matrix network. This may also be useful to Forge Package Archiving, as HydraCI continuously builds packages.

Identifying Automatons

Every automaton will have several identifiers:

Instance Address: An automaton will be instantiated from a package. It is an instance of that package. Thus, an automaton will have an address that represents this particular instance. This allows automatons that have been instantiated from the same package to still have a stable, unique identifier. This Instance address is also preserved if the automaton is transferred live between machines (see below)
Package Address: This is the content address of the package that the automaton was instantiated from. Many automatons may share the same package address.
Network Address: This identifier is used to determine the location of the automaton within the network. When automatons are transferred between machines, this network address may change.

Transferring Automatons between Machines

We distinguished between two types of automatons

Light automatons that have no or very little state
Heavy automatons that have state When we talk about an automaton, we are usually referring to some sort of service that is being provided by that automaton. This automaton transfer will depend on a few factors that determine 'control points' at which it is safe for an automaton transfer to occur:
The automaton must have completed all current messaging exchanges with another peer automaton. These messaging exchanges may happen over a variety of different protocols. Essentially we can determine this by ensuring that all streams to peer automatons are in the closed state. The reason this needs to occur is it is too difficult to re-route incoming data from a peer on the fly to the new location of the automaton after the transfer.
Essentially this introduces a contention issue on the automaton, where message exchanges will have to wait until the automaton is transferred to the new machine.

Nature of packages

A package is a wrapper around some independently created program. Packages should be able to wrap any program (language-agnostic). When we say program, we specifically refer to the program source, and not compiled versions of the program. To facilitate scalable architectures, Matrix must handle the compilation of the program source to the target machine.

Container

After the source program has been wrapped in a matrix package, we can refer to the compiled version of that package as a container. One of the core problems that we face with containers is ensuring the traditional communication paradigms that are available to the original source program are still available after it has been wrapped in a package and is run as a container on several different machines with different architectures, and perhaps different network protocols available on those machines. We essentially need to be able to tunnel protocols and expose low level things like sockets to the program. This may require even more sophisticated low level integration like linux kernel modules that enable this to occur.

Service Dependencies

We talked about how service dependencies can be managed in the swarm. The central idea is that a service running on an automaton will be able to pull in service dependencies into the network by contacting a centralised orchestrator that is responsible for pulling necessary service dependencies. Alternatively, this action of pulling in service dependencies could be embedded into the control of the network itself (This is the same as there being a service that represents operations on the network).

Global Swarm

We envision a global swarm that will be available for anyone using the MatrixAI network. This way, service provision could be shared across the whole network p2p style. Anybody running a service may be able to provide that service to another user participating in that network. The reason this is beneficial is by supporting a virtual network layer over a variety of different networking protocols, the range of devices that can communicate on this network will be far greater than the current status quo.

Action Plan

LibP2P
- Kernel Modules
- IPTables for packet routing
- Berkeley Packet Filtering
- NetNS
- Container Networking: How do containers communicate with each other and centralised orchestrators in frameworks like Kubernetes
- Userspace Networking: Using libraries like Snabb and Software defined networks to facilitate virtual network communication that allow us to consistently use the same network layer without having to worry about the particular networking capabilities of the host machine
- Discovery: Like Nodes in LibP2P, we need to be able to find and identify peers on many different physical networks (ethernet LAN, WLAN, Mesh, bluetooth, whatever)
Polykey
Architect

Timeline

Completing the http2stream muxer LibP2P: We need to have components like basic discovery (perhaps bootstrap), and a swarm by the end of March. The main work here to do is defining the interfaces for Stream Muxer, Transport, Discovery, whatever we need to get two nodes communication with each other.
Moving on from here, we begin to look at research into NixOS/Hydra for the Forge portion and see how service dependencies and continuous integration techniques can be managed as part of our container network.
Research into Snabb, to look at using virtual networks to support the multiple network capabilities of a variety of different host machines, and protocol tunnelling to support applications that are written for the conventional web (i.e. using existing frameworks). The reason this is important is that we need to provide an upgrade path for people using old http frameworks to new protocols (i.e. a backwards compatibility layer), otherwise the technology won't be adopted.
Some other research areas include kernel network NS (LXC)
Open Container Initiative
Paper on service networking (Serval), which has been referenced in a few other papers working on service mesh networks and service discovery.
Service Centric Networking
SCION
L-SCN

MatrixAI / Forge-Package-Archiving

Roadmap #2