cosmos-spaces / ansible-cosmos-validators

This repository secures cloud provider servers, installs and configures CometBFT based chains for both, validator and sentry (relayer) nodes, and installs Horcrux using Ansible.
MIT License
0 stars 1 forks source link

Cosmos CometBFT Setup with Horcrux Support

This repository secures cloud provider servers, installs and configures CometBFT based chains for validator, sentry and relayer node types as well as Horcrux using Ansible.

Design Philosophy

  1. Secure server setup
  2. Extendable to most CometBFT-based (Formerly known as Tendermint) chains
  3. Support both mainnet and testnet
  4. Support horcrux install and node config updates
  5. Stable playbooks and roles; Customizable variables
  6. Support essential functions (snapshot, state-sync, public RPC/API endpoints) through separate playbooks

TL/DR

Run the desired playbook with the following arguments:

# Node Setup
ansible-playbook setup.yml -e "target=<mainnet|testnet|horcrux_cluster>" -e "ssh_port=<non_standard_ssh_port>"

# Install/Configure Chain
ansible-playbook main.yml -e "target=<mainnet|testnet>" -e "chain=<chain>"

# Install/Configure Horcrux
ansible-playbook horcrux.yml -e "target=horcrux_cluster|horcrux_cluster_testnet>"

# Configure Prometheus for Chain
ansible-playbook support_prometheus.yml -e "target=<mainnet|testnet|horcrux_cluster>"-e "chain=<chain>"

# Configure Tenderduty for Chain
ansible-playbook support_tenderduty.yml -e "target=<mainnet|testnet|horcrux_cluster>"-e "chain=<chain>"

Architecture

For every chain where we run a validator on mainnet, we run 2 sentry nodes connected to a 3/3 cosigner node horcrux cluster.

Leveraging Horcrux provides high-availability while maintaining high security and avoiding double signing via consensus and failover detection mechanisms. This allows to connect multiple sentry nodes to cosigner nodes, which reduces downtime and block signing failures, and increases fault tolerance and resiliency of blockchain operations.

Node Setup

Typically, a cloud server provides a machine with root access and insecure setup. This ansible playbook is designed to address those issues. It is based on Ubuntu 22.04, but it should be applicable to other Ubuntu images. To run this playbook, you will need a user with sudo privileges. This playbook does not create a user on purpose as a security measure to avoid using root. This playbook will perform the following:

  1. Set the hostname (based on inventory file)
  2. Update server: Simply update and upgrade all applications shipped with the OS.
  3. Install and configure essential software dependencies
  4. Install ufw
  5. Install firewall
  6. Install fail2ban
  7. Install cosmovisor
  8. Optionally install node exporter (configurable in inventory)
  9. Optionally install promtail (configurable in inventory)
  10. Optionally install nginx (configurable in inventory)

Secure Node

  1. Disable the default ssh port of 22 and set up the alternative port.
  2. Deny all incoming traffic.
  3. Enable firewall to allow the ssh alternative port from the bastion/jumpbox ip.
  4. Disable root account access.
  5. Disable password authentication.

Variables

Look at the sample.inventory.yml file. You will see an example of how the structure should be to configure your CometBFT clusters:

  1. target: Required. Whether mainnet or tesnet.
  2. ansible_host: Required. The IP address of the server(s).
  3. ssh_port: Required. Alternate ssh port to configure on the server. This can be different per host. By default, it will apply the same port for all servers.
  4. server_hostname Required. Sets the hostname to this value.
  5. bastion_ip Required. Bastion/Jumpbox IP to allow ssh access to the server. It can be an address range as well.

Change the file name from sample.inventory.yml to inventory.yml and update the values accordingly.

Run node setup playbook

# Node Setup
ansible-playbook setup.yml -e "target=<mainnet|testnet|horcrux_cluster>" -e "ssh_port=<non_standard_ssh_port>"

Install/Configure Chain

As mentioned above, we run 2 sentry nodes connected to a 3/3 cosigner node horcrux cluster. However, this repo supports configuring chain nodes as a validator, sentry or relayer, each with different settings described below. If you do not wish to use Horcrux, set the type to validator for each corresponding node.

Opinionated Configuration

We have 2 strong opinions about the node configuration:

  1. Each chain will have its custom 3-digit port prefix. This is to prevent port collision if you run multiple nodes on the same server. For example, you can configure Babylon with the custom port prefix 109 and Osmosis with 110. It is up to you what port prefix to use.
  2. Each type of node will have its setting based on our experience. For example, the main node (validator) has 100/0/ pruning, sentry node has 1000/100/ pruning, and relayer has 50000/100/ pruning. We will force these settings on you unless you fork the code.

Variables

Look at the inventory.sample.yml file. You will see an example of how the structure should be to configure your CometBFT clusters. All these values can be set per mainnet/testnet, host, chain or global.

  1. target: Required. Whether mainnet or tesnet.
  2. ansible_host: Required. The IP address of the server.
  3. chain: Required. The chain network name to install/configure (should match file vars/<testnet/mainnet>).
  4. type: Required. It can be validator, sentry or relayer. Each is opinionated in its configuration settings.
  5. ansible_user: The sample file assumes ubuntu, but feel free to use another username. This user needs sudo privilege.
  6. ansible_port: The sample file assumes 22. If you ran the node setup playbook, it should match ssh_port.
  7. ansible_ssh_private_key_file: Path to ssh key file.
  8. var_file: It tells the program where to look for the variable file.
  9. user_dir: The user's home directory. In the sample inventory file this is a computed variable based on the ansible_user. It assumes that it is not a root user and its home directory is /home/{{ansible_user}}.
  10. path: This is to make sure that the ansible_user can access the go executable.
  11. node_name: This is your node name or moniker for the config.toml file.

There are additional variables under group_vars/all.yml for global configuration applied to all chains.

  1. node_exporter_version: Node exporter version to install.
  2. promtail_version: Promtail version to install.
  3. go_version: Go version to install.
  4. cosmovisor_version: Cosmovisor version to install.
  5. cosmovisor_service_name: Systemctl prefix for the chain's cosmovisor service.
  6. node_exporter: Default is true. Change it to false if you do not want to install node_exporter. If true, enables the prometheus port in config.toml.
  7. promtail: Default is false. Change it to true if you want to install promtail.
  8. nginx: Default is false. Change it to true if you want to install nginx.
  9. log_monitor: Enter your monitor server IP if you install promtail.
  10. log_name: This is the server's name for the promtail service.
  11. pagerduty_key: This is the PagerDuty key if you use TenderDuty.
  12. enableapi Default is false. Set to true if you want to enable the api endpoint.
  13. enablegrpc: Default is false. Set to true if you want to enable the grpc endpoint.
  14. publicrpc: Default is false. Set to true if you want to allow the rpc port on the server.
  15. external_address: IP address to set as an external address in config.toml.

Look at vars/mainnet|testnet/<chain>.yaml for chain specific variables.

Run install/configure playbook

# Install/Configure Chain
ansible-playbook main.yml -e "target=<mainnet|testnet>" -e "chain=<chain>"

Install/Configure Horcrux

This playbook will install Horcrux, a multi-party-computation (MPC) signing service for CometBFT, on the servers defined in inventory.yml under horcrux_cluster.

Variables

  1. ansible_host: Required. The IP address of the server.
  2. type: Should always be set to horcrux.
  3. restart_horcrux: Defaults to true. Change to false if you do not want the horcrux service to restart after a config update.
  4. nodes: priv-val interface listen address for the chain sentry nodes to add to the config.

There are additional variables under group_vars/all.yml for global configuration applied to all horcrux cosigner nodes.

  1. horcrux_repo: Repo URL where the horcrux code resides.
  2. horcrux_version: Horcrux version to install.
  3. horcrux_cosigner_port: Defaults to 2222. Port cosigner nodes listen on.

Run install/configure playbook

# Install/Configure Horcrux
ansible-playbook horcrux.yml

Configure Prometheus for Chain

This playbook will configure a new prometheus target with info from the chain.yml on the servers defined in inventory.yml under telemetry.

Variables

  1. target: Required. Whether mainnet or tesnet.
  2. chain: Required. The chain network name to install/configure (should match file vars/<testnet/mainnet>).
  3. var_file: It tells the program where to look for the variable file.
  4. cosmos_prom_file: It tells the program the filename of the prometheus targets for the chains.

Run install/configure playbook

# Configure Prometheus for Chain
ansible-playbook support_prometheus.yml -e "target=<mainnet|testnet>" -e "chain=<chain>"

Configure Tenderduty for Chain

This playbook will configure a new Tenderduty chain with info from the chain.yml on the servers defined in inventory.yml under telemetry.

Variables

  1. target: Required. Whether mainnet or tesnet.
  2. chain: Required. The chain network name to install/configure (should match file vars/<testnet/mainnet>).
  3. var_file: It tells the program where to look for the variable file.
  4. tender_config_file: It tells the program the filename of the prometheus targets for the chains.
  5. tender_url: It tells the program the url for to check for liveness and health after editing the tender_config_file.

Run install/configure playbook

# Configure Prometheus for Chain
ansible-playbook support_tenderduty.yml -e "target=<mainnet|testnet>" -e "chain=<chain>"

Manual Steps

For more information, refer to the documentation.

Playbooks

Playbook Description
main.yml The main playbook to set up a node
node_alertmanager.yml Installs and configures alert manager
node_tenderduty.yml Install Tenderduty
setup.yml Secure the server with ssh config changes and firewall rules, and install dependencies
support_backup_node.yml Install snapshot, state_sync, resync, genesis and prune script on backup node
support_snapshot.yml Install snapshot script and a cron job
support_state_sync.yml Install state-sync script
support_resync.yml Install weekly scheduled state-sync and recovery script
support_genesis.yml Install a script to upload genesis
support_prune.yml Install a script to prune using cosmprund
support_public_endpoints.yml Set up Nginx reverse proxy for public RPC/API
support_seed.yml Install seed node with Tenderseed. You need a node_key.json.j2 file so the node_id is consistent
support_price_feeder.yml Install price feeders for selected chains (such Umee, Kujira, etc.)
support_scripts.yml Install scripts to make node operations easier
support_sync_snapshot.yml Sync node from a snapshot
support_remove_node.yml Remove a node and clean up
support_update_min_gas.yml Update minimum gas price
horcrux.yml Install horcrux cluster
support_horcrux_config.yml Add additional nodes to the horcrux config
support_chain_horcrux Updates priv_validator_laddr with horcrux port
support_bastion_firewall Allow additional IPs to connect to bastion
support_prometheus Configure Prometheus with a given chain
support_tenderduty Configure Tenderduty with a given chain

Selected playbook Usage Example

support_seed
ansible-playbook support_seed.yml -e "target=<mainnet|testnet>" -e "chain=<chain>" -e "seed=190c4496f3b46d339306182fe6a507d5487eacb5@65.108.131.174:36656"
support_scripts
ansible-playbook support_scripts.yml -e "target=<mainnet|testnet>"

Currently, we have 4 supported scripts. Their usage is documented below using Juno as example:

./scripts/bank_balances/juno.sh
./scripts/bank_send/juno.sh ADDRESS 1000000ujuno
./scripts/distribution_withdrawal/juno.sh
./scripts/gov_vote/juno.sh 1 yes
support_horcrux_config
ansible-playbook support_horcrux_config.yml

Contribute

We believe we can always improve so feel free to fork this repo and create a PR with your changes so other people can also benefit from them.

Acknowledgement

This could not have been possible without the help of the people listed below. Thank you very much for providing this framework and creating an environment of collaboration while promoting automation, reliability, and security:

Known Issues

Because this repo tries to accommodate as many Tendermint-based chains as possible, it cannot adapt to all edge cases. Here are some known issues and how to resolve them.

Chain Issue Solution
Axelar Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Canto genesis file needs to be unwrapped from .result.genesis Unwrap genesis with jq command
Injective Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Kichain Some extra lines at the end of app.toml Delete extra lines and adjust some settings these extra lines are supposed to change
Celestia testnet inconsistent config.toml file variable naming convention Manually adjust config.toml file