We briefly discussed a nix-expression rewrite system similar to that of gx
package manager for go. What it would do is rewrite source addresses (sources
that are required to build the nix package) from a standard url to an ipfs url
where that content is hosted. For example:
"https://git.kernel.org/torvalds/t/linux-4.15-rc8.tar.gz" -> "ipfs/QmeF59wRCygGMrEJbLdYS1CyJmA2XowaR9oRX2VJty7nCR", which is a content address of the tarball using the multihash scheme"
ipfs itself provides a specific facility to store tar compressed archives using
ipfs tar add, which parses tarballs into a merkle dag structure. Not sure
what goes on behind the scenes to parse this tarball into merkle dag structure.
The nix-prefetch-ipfs would thus do the following:
download source from URL
check if local daemon running
otherwise use remote ipfs gateway
upload source to ipfs, getting back the content address
return package in local nix store, and the content address of the package in
the ipfs cloud
More about Forge Package Archiving
There are two things in nixos that can be pushed to ipfs: build-outputs (binary
cache compiled packages) and build-inputs (used to build the packages using a
nix expression). Forge Package Archiving looks to move build-inputs into ipfs.
HydraCI, TravisCI
We need to look at systems that do continuous integration so we can see how we
can successfully propogate updates to automatons in the Matrix network. This
may also be useful to Forge Package Archiving, as HydraCI continuously builds
packages.
Identifying Automatons
Every automaton will have several identifiers:
Instance Address: An automaton will be instantiated from a package. It is
an instance of that package. Thus, an automaton will have an address that
represents this particular instance. This allows automatons that have been
instantiated from the same package to still have a stable, unique identifier.
This Instance address is also preserved if the automaton is transferred live
between machines (see below)
Package Address: This is the content address of the package that the
automaton was instantiated from. Many automatons may share the same package
address.
Network Address: This identifier is used to determine the location of the
automaton within the network. When automatons are transferred between
machines, this network address may change.
Transferring Automatons between Machines
We distinguished between two types of automatons
Light automatons that have no or very little state
Heavy automatons that have state
When we talk about an automaton, we are usually referring to some sort of
service that is being provided by that automaton. This automaton transfer will
depend on a few factors that determine 'control points' at which it is safe for
an automaton transfer to occur:
The automaton must have completed all current messaging exchanges with
another peer automaton. These messaging exchanges may happen over a variety
of different protocols. Essentially we can determine this by ensuring that
all streams to peer automatons are in the closed state. The reason this needs
to occur is it is too difficult to re-route incoming data from a peer on the
fly to the new location of the automaton after the transfer.
Essentially this introduces a contention issue on the automaton, where
message exchanges will have to wait until the automaton is transferred to the
new machine.
Nature of packages
A package is a wrapper around some independently created program. Packages
should be able to wrap any program (language-agnostic). When we say program, we
specifically refer to the program source, and not compiled versions of the
program. To facilitate scalable architectures, Matrix must handle the
compilation of the program source to the target machine.
Container
After the source program has been wrapped in a matrix package, we can refer to
the compiled version of that package as a container. One of the core problems
that we face with containers is ensuring the traditional communication
paradigms that are available to the original source program are still available
after it has been wrapped in a package and is run as a container on several
different machines with different architectures, and perhaps different network
protocols available on those machines. We essentially need to be able to tunnel
protocols and expose low level things like sockets to the program. This may
require even more sophisticated low level integration like linux kernel modules
that enable this to occur.
Service Dependencies
We talked about how service dependencies can be managed in the swarm. The
central idea is that a service running on an automaton will be able to pull in
service dependencies into the network by contacting a centralised orchestrator
that is responsible for pulling necessary service dependencies. Alternatively,
this action of pulling in service dependencies could be embedded into the
control of the network itself (This is the same as there being a service that
represents operations on the network).
Global Swarm
We envision a global swarm that will be available for anyone using the MatrixAI
network. This way, service provision could be shared across the whole network
p2p style. Anybody running a service may be able to provide that service to
another user participating in that network. The reason this is beneficial is by
supporting a virtual network layer over a variety of different networking
protocols, the range of devices that can communicate on this network will be
far greater than the current status quo.
Action Plan
LibP2P
Kernel Modules
IPTables for packet routing
Berkeley Packet Filtering
NetNS
Container Networking: How do containers communicate with each other and
centralised orchestrators in frameworks like Kubernetes
Userspace Networking: Using libraries like Snabb and Software defined
networks to facilitate virtual network communication that allow us to
consistently use the same network layer without having to worry about the
particular networking capabilities of the host machine
Discovery: Like Nodes in LibP2P, we need to be able to find and identify
peers on many different physical networks (ethernet LAN, WLAN, Mesh,
bluetooth, whatever)
Polykey
Architect
Timeline
Completing the http2stream muxer LibP2P: We need to have components like basic discovery (perhaps bootstrap),
and a swarm by the end of March. The main work here to do is defining the
interfaces for Stream Muxer, Transport, Discovery, whatever we need to get
two nodes communication with each other.
Moving on from here, we begin to look at research into NixOS/Hydra for the
Forge portion and see how service dependencies and continuous integration
techniques can be managed as part of our container network.
Research into Snabb, to look at using virtual networks to support the
multiple network capabilities of a variety of different host machines, and
protocol tunnelling to support applications that are written for the
conventional web (i.e. using existing frameworks). The reason this is
important is that we need to provide an upgrade path for people using old
http frameworks to new protocols (i.e. a backwards compatibility layer),
otherwise the technology won't be adopted.
Some other research areas include kernel network NS (LXC)
Open Container Initiative
Paper on service networking (Serval), which has been referenced in a few
other papers working on service mesh networks and service discovery.
Discussion of nix-prefetch-ipfs
We briefly discussed a nix-expression rewrite system similar to that of gx package manager for go. What it would do is rewrite source addresses (sources that are required to build the nix package) from a standard url to an ipfs url where that content is hosted. For example: "https://git.kernel.org/torvalds/t/linux-4.15-rc8.tar.gz" -> "ipfs/QmeF59wRCygGMrEJbLdYS1CyJmA2XowaR9oRX2VJty7nCR", which is a content address of the tarball using the multihash scheme" ipfs itself provides a specific facility to store tar compressed archives using
ipfs tar add
, which parses tarballs into a merkle dag structure. Not sure what goes on behind the scenes to parse this tarball into merkle dag structure.The nix-prefetch-ipfs would thus do the following:
More about Forge Package Archiving
There are two things in nixos that can be pushed to ipfs: build-outputs (binary cache compiled packages) and build-inputs (used to build the packages using a nix expression). Forge Package Archiving looks to move build-inputs into ipfs.
HydraCI, TravisCI
We need to look at systems that do continuous integration so we can see how we can successfully propogate updates to automatons in the Matrix network. This may also be useful to Forge Package Archiving, as HydraCI continuously builds packages.
Identifying Automatons
Every automaton will have several identifiers:
Transferring Automatons between Machines
We distinguished between two types of automatons
Nature of packages
A package is a wrapper around some independently created program. Packages should be able to wrap any program (language-agnostic). When we say program, we specifically refer to the program source, and not compiled versions of the program. To facilitate scalable architectures, Matrix must handle the compilation of the program source to the target machine.
Container
After the source program has been wrapped in a matrix package, we can refer to the compiled version of that package as a container. One of the core problems that we face with containers is ensuring the traditional communication paradigms that are available to the original source program are still available after it has been wrapped in a package and is run as a container on several different machines with different architectures, and perhaps different network protocols available on those machines. We essentially need to be able to tunnel protocols and expose low level things like sockets to the program. This may require even more sophisticated low level integration like linux kernel modules that enable this to occur.
Service Dependencies
We talked about how service dependencies can be managed in the swarm. The central idea is that a service running on an automaton will be able to pull in service dependencies into the network by contacting a centralised orchestrator that is responsible for pulling necessary service dependencies. Alternatively, this action of pulling in service dependencies could be embedded into the control of the network itself (This is the same as there being a service that represents operations on the network).
Global Swarm
We envision a global swarm that will be available for anyone using the MatrixAI network. This way, service provision could be shared across the whole network p2p style. Anybody running a service may be able to provide that service to another user participating in that network. The reason this is beneficial is by supporting a virtual network layer over a variety of different networking protocols, the range of devices that can communicate on this network will be far greater than the current status quo.
Action Plan
Timeline