Infrastructure Automation

I was putting together a table with all the edge cases and then noticed that there were 50 chains. So I'm going to summarize:

User Stories

As a scaled relayer, I want others to be able to begin relaying with ease

As a scaled relayer, I want to be able to run more nodes in a stable manner and scale my system by adding either bare metal or virtual machines to my network

As a validator, I want to be able to retrieve chain state rapidly.

blockers

CosmWasm doesn't state sync and I haven't had time to work on getting the support for module state sync into the older versions of the sdk. It should state sync in the future.
Many chains have edge cases, and for good reasons.
Many chains have edge cases, and for silly reasons.

A Heartwarming open source story

Chain developers benefit from systems that allow communities to provide needed infrastructure with ease. A good example of this is the block explorer for dig. Chill validation used ping.pub's excellent open source explorer to make an explorer for dig on day one. We did not have to build that infrastructure, and it was a huge load off of our shoulders.

Later, I made a personal donation to ping and our team delegates to both ping and chill, who are advancing the same codebase in different ways, kind of like how Tenderseed1->Tenderseed2->Tinyseed->(it really forked a lot and I can't track all the names terra teams have given it)

Open source infrastructure automation for cosmos

To drill in on infrastructure for a moment, we need it very badly, and we have conditions:

The infrastructure automation solution needs to work on bare metal and in the cloud because there's a time and a place for each.
The IA solution shouldn't have proprietary dependencies. It is fine if it runs on proprietary systems.
The IA solution should be mindful of primary use cases:
- Public RPCs
- Relaying
- Enhancing network stability
The IA solution should be mindful of user time. Our community shouldn't spend lots and lots of time syncing nodes when there's so much awesome chaincode to write!
The IA solution should be as easy to use as possible
The it should target x86 and arm without getting lost in the weeds.
The IA solution should exclude validation, since an "oops" could prove disastrous.
- Solutions like horcrux may improve this, but for now, this is a solution that makes it possible for small teams of even just one person to reliably run every chain in the registry except for those that do not release their source code, I plainly refuse to work on such, and hope that others will join me in pointing out that the security model of proof of stake chains relies on the ability of any interested party to audit code at will. (doesn't mean you need to gplv3, btw)

Possible solutions for chain images

Docker
- Distroless
- Alpine <- musl bad
- Arch Linux <- this is my prefrerence
- Ubuntu <- Should be avoided because they don't ship up to date Golang nor up to date rust.

My opinion on the images is that we should ship multiarch where possible, do a distroless container because some people love kube, and build binaries in arch, while shipping an arch-based docker image that includes the binary. That OS cruft can be very helpful sometimes and doesn't weigh much or pose much of a security risk.

Solutions for getting chain state

State Sync
quicksync.io and similar services
Ship the state in a truncated form with the image and allow the user to make <- this is my preference

My opinion is that the cosmos community needs a publicly accessible docker registry that serves docker images that include chain state. We can put it somewhere with an unmetered 10gbps line and equip it with plenty of storage. Users would download images that weigh between 1-15gb as docker images and they can use volume mounts to ensure persistence if they'd like to.

Infrastructure Automation tooling

Docker swarm <- my first choice
Kubernetes <- my second choice
Nomad <- the dark horse

Both swarm and kube have relatively easy to use bare metal and scalable cloud versions. They both support x86 and arm, though neither will satisfy my desire to learn more of the ways of the Hashicorp. It's my opinion that Kube treats every problem like it is a google-scale problem, and that swarm is more accessible to new users, while still scaling admirably.

User flow

The user flow should be the same anywhere-- you'll provision one or more servers, and write a compose file or helm chart (or whatever is in vogue with kube these days) and in that compose file you'll let your system know how many replicas you want. The state will already be in the images, and you'll be able to frequently re-sync so that you are able to save on the need for high iops disks and large disks.

Endpoints

Must https
Must wss

It should be easy for users to configure both https and wss for their chain endpoints, and as we progress towards removing the traditional rest endpoints, we should make sure to have the grpc gateway well documented.

How to do

I am fiddling with

https://github.com/ovrclk/cosmos-omnibus

and currently run some cosmos infrastructure in Akash. I'd like to run much more of it in Akash, but often like a greater degree of control than it offers. Today I am going to see if I can make some docker-compose.yml files that use omnibus to either state sync or ship truncated images inside.

The eventual solution must be triggered when pull requests hit this repository, so that images stay up to date. Yet another reason to avoid closed source "blockchains"

cosmos / chain-registry

Infrastructure Automation #214