cockroachdb / dcos-cockroachdb-service

Framework for running CockroachDB on Mesosphere DC/OS
Apache License 2.0
8 stars 6 forks source link

CockroachDB Service Guide

Table of Contents

Overview

DC/OS CockroachDB is an automated service that makes it easy to deploy and manage CockroachDB on DC/OS.

CockroachDB is an open source distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

For more details, check out our website, FAQ, architecture docs, or Github repository.

Features

Quick Start

  1. Install DC/OS on your cluster. See the documentation for instructions.

  2. If you are using open source DC/OS, install CockroachDB cluster with the following command from the DC/OS CLI. If you are using Enterprise DC/OS, you may need to follow additional instructions. See the Install and Customize section for more information. You can also install CockroachDB from the DC/OS web interface.

    dcos package install cockroachdb
  3. The service will now deploy with a default configuration. You can monitor its deployment from the Services tab of the DC/OS web interface.

  4. Connect a client to CockroachDB.

    $ dcos cockroachdb endpoints
    [
      "http",
      "pg",
    ]
    $ dcos cockroachdb endpoints pg
        {
          "vips": ["pg.cockroachdb.l4lb.thisdcos.directory:26257"],
          "address": [
            "10.0.2.77:26257",
            "10.0.0.61:26257",
            "10.0.1.215:26257"
          ],
          "dns": [
            "cockroachdb-0-node-init.cockroachdb.autoip.dcos.thisdcos.directory:26257",
            "cockroachdb-1-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257",
            "cockroachdb-2-node-join.cockroachdb.autoip.dcos.thisdcos.directory:26257"
          ],
          "vip": "pg.cockroachdb.l4lb.thisdcos.directory:26257"
        }
    1. Open up a SQL shell to read and write data in your cluster by accessing the vip endpoint.

      $ dcos node ssh --master-proxy --leader
      $ docker run -it cockroachdb/cockroach sql --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory
      # Welcome to the cockroach SQL interface.
      # All statements must be terminated by a semicolon.
      # To exit: CTRL + D.
      root@pg.cockroachdb.l4lb.thisdcos.directory:26257/> CREATE
      DATABASE bank;
      CREATE DATABASE
      root@pg.cockroachdb.l4lb.thisdcos.directory:26257/> CREATE TABLE
      bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
      CREATE TABLE
      root@pg.cockroachdb.l4lb.thisdcos.directory:26257/> INSERT INTO
      bank.accounts VALUES (1234, 10000.50);
      INSERT 1
      root@pg.cockroachdb.l4lb.thisdcos.directory:26257/> SELECT * FROM
      bank.accounts;
      +------+----------+
      |  id  | balance  |
      +------+----------+
      | 1234 | 10000.50 |
      +------+----------+
      (1 row)

Installing and Customizing

The default CockroachDB installation provides reasonable defaults for trying out the service, but you may require different configurations depending on the context of your deployment.

Prerequisities

Installation from the DC/OS CLI

To start a basic test cluster, run the following command on the DC/OS CLI. Enterprise DC/OS users must follow additional instructions. More information about installing CockroachDB on Enterprise DC/OS.

dcos package install cockroachdb

You can specify a custom configuration in an options.json file and pass it to dcos package install using the --options parameter.

$ dcos package install cockroachdb --options=your-options.json

For more information about building the options.json file, see the DC/OS documentation for service configuration access.

Installation from the DC/OS Web Interface

You can install CockroachDB from the DC/OS web interface. If you install CockroachDB from the web interface, you must install the CockroachDB DC/OS CLI subcommands separately. From the DC/OS CLI, enter:

dcos package install cockroachdb --cli

Choose ADVANCED INSTALLATION to perform a custom installation.

Service Settings

Service Name

Each instance of CockroachDB in a given DC/OS cluster must be configured with a different service name. You can configure the service name in the service section of the advanced installation section of the DC/OS web interface. The default service name (used in many examples here) is cockroachdb.

Node Settings

Adjust the following settings to customize the amount of resources allocated to each node. CockroachDB's system requirements_ must be taken into consideration when adjusting these values. Reducing these values below those requirements may result in adverse performance and/or failures while using the service.

Each of the following settings can be customized under the node configuration section.

Node Count

Customize the Node Count setting (default 3) under the node configuration section. Consult the CockroachDB documentation for minimum node count requirements.

CPU

You can customize the amount of CPU allocated to each node. A value of 1.0 equates to one full CPU core on a machine. Change this value by editing the cpus value under the node configuration section. Turning this too low will result in throttled tasks.

Memory

You can customize the amount of RAM allocated to each node. Change this value by editing the mem value (in MB) under the node configuration section. Turning this too low will result in out of memory errors.

Ports

You can customize the ports exposed by the service via the service configuratiton. If you wish to install multiple instances of the service and have them colocate on the same machines, you must ensure that no ports are common between those instances. Customizing ports is only needed if you require multiple instances sharing a single machine. This customization is optional otherwise.

There are two ports that can be customized: the pg port, which is used for inter-node communication and accepting client connections via the PostgreSQL wire protocol, and the http port which serves the CockroachDB Admin UI as well as some debug endpoints.

Storage Volumes

The service supports two volume types:

Using MOUNT volumes requires additional configuration on each DC/OS agent system, so the service currently uses ROOT volumes by default. To ensure reliable and consistent performance in a production environment, you should configure MOUNT volumes on the machines that will run the service in your cluster and then configure the node Disk Type setting to use MOUNT volumes.

Placement Constraints

Placement constraints allow you to customize where the service is deployed in the DC/OS cluster. Placement constraints may be configured as a node parameter using the Placement constraint option.

Placement constraints support all Marathon operators with this syntax: field:OPERATOR[:parameter]. For example, if the reference lists [["hostname", "UNIQUE"]], use hostname:UNIQUE.

A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:

hostname:LIKE:10.0.0.159|10.0.1.202|10.0.3.3

You must include spare capacity in this list, so that if one of the whitelisted systems goes down, there is still enough room to repair your service without that system.

For an example of updating placement constraints, see Managing below.

Overlay networks

CockroachDB supports deployment on the dcos overlay network, a virtual network on DC/OS that allows each node to have its own IP address and not use the ports resources on the agent. This can be specified by passing the following configuration during installation:

{
    "service": {
        "virtual_network": true
    }
}

By default two nodes will not be placed on the same agent, however multiple CockroachDB clusters can share an agent. As mentioned in the developer guide once the service is deployed on the overlay network, it cannot be updated to use the host network.

CockroachDB Settings

Most CockroachDB settings are configured after the cluster has started, using the SET CLUSTER SETTING SQL command. For information on how to set cluster settings and which settings are available, please see CockroachDB's documentation.

Uninstalling

Follow these steps to uninstall the service.

  1. Uninstall the service. From the DC/OS CLI, enter dcos package uninstall.
  2. If you are running a DC/OS Version older than 1.10: Clean up remaining reserved resources with the framework cleaner script, janitor.py. More information about the framework cleaner script. To uninstall an instance named cockroachdb (the default), run:
    $ MY_SERVICE_NAME=cockroachdb
    $ dcos package uninstall --app-id=$MY_SERVICE_NAME $MY_SERVICE_NAME
    $ dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
      -r $MY_SERVICE_NAME-role \
      -p $MY_SERVICE_NAME-principal \
      -z dcos-service-$MY_SERVICE_NAME"

Connecting Clients

CockroachDB clients can use the standard PostgreSQL wire protocol for all communication with the cluster, which means that existing PostgreSQL client drivers can be used. A list of client drivers and ORMs (Object-Relational Mappings) that have been tested to work can be found on the Cockroach Labs website. The CockroachDB binary also comes with an interactive SQL shell which you can access via the cockroach sql command on the binary.

Discovering Endpoints

One of the benefits of running containerized services is that they can be placed anywhere in the cluster. Because they can be deployed anywhere on the cluster, clients need a way to find the service. This is where service discovery comes in.

Once the service is running, you may view information about its endpoints via either of the following methods:

Returned endpoints will include the following:

In general, the .mesos endpoints will only work from within the same DC/OS cluster. From outside the cluster you can either use the direct IPs or set up a proxy service that acts as a frontend to your CockroachDB instance. For development and testing purposes, you can use DC/OS Tunnel to access services from outside the cluster, but this option is not suitable for production use.

Connecting Clients to Endpoints

To use a DC/OS CockroachDB cluster, all you need to do is connect to the HA-enabled VIP hostname from the above Discovering Endpoints section using any PostgreSQL client driver.

For example, to connect using CockroachDB's built-in SQL client, you can open up a shell by running:

dcos node ssh --master-proxy --leader
docker run -it cockroachdb/cockroach sql --insecure --host=pg.cockroachdb.l4lb.thisdcos.directory

Managing

Updating Configuration

You can make changes to the service after it has been launched. Configuration management is handled by the scheduler process, which in turn handles deploying CockroachDB itself.

Edit the runtime environment of the scheduler to make configuration changes. After making a change, the scheduler will be restarted and automatically deploy any detected changes to the service, one node at a time. For example, a given change will first be applied to cockroachdb-0, then cockroachdb-1, and so on.

Nodes are configured with a "Readiness check" to ensure that the underlying service appears to be in a healthy state before continuing with applying a given change to the next node in the sequence. However, this basic check is not foolproof and reasonable care should be taken to ensure that a given configuration change will not negatively affect the behavior of the service.

Some changes, such as changing volume requirements, are not supported after initial deployment. See Limitations.

To make configuration changes via scheduler environment updates, perform the following steps:

  1. Visit to access the DC/OS web interface.
  2. Navigate to Services and click on the service to be configured (default cockroachdb).
  3. Click Edit in the upper right. On DC/OS 1.9.x, the Edit button is in a menu made up of three dots.
  4. Navigate to Environment (or Environment variables) and search for the option to be updated.
  5. Update the option value and click Review and run (or Deploy changes).
  6. The Scheduler process will be restarted with the new configuration and will validate any detected changes.
  7. If the detected changes pass validation, the relaunched Scheduler will deploy the changes by sequentially relaunching affected tasks as described above.

To see a full listing of available options, run dcos package describe --config cockroachdb in the CLI, or browse the CockroachDB install dialog in the DC/OS web interface.

Adding a Node

The service deploys 3 nodes by default. You can customize this value at initial deployment or after the cluster is already running. Shrinking the cluster is not supported.

Modify the NODE_COUNT environment variable to update the node count. If you decrease this value, the scheduler will prevent the configuration change until it is reverted back to its original value or larger.

Resizing a Node

The CPU and Memory requirements of each node can be increased or decreased as follows:

Note: Volume requirements (type and/or size) cannot be changed after initial deployment.

Updating Placement Constraints

Placement constraints can be updated after initial deployment using the following procedure. See Service Settings above for more information on placement constraints.

Let's say we have the following deployment of our nodes

10.0.10.8 is being decommissioned and we should move away from it. Steps:

  1. Remove the decommissioned IP and add a new IP to the placement rule whitelist by editing NODE_PLACEMENT:

    hostname:LIKE:10.0.10.3|10.0.10.26|10.0.10.28|10.0.10.84|10.0.10.123
  2. Redeploy cockroachdb-1 from the decommissioned node to somewhere within the new whitelist: dcos cockroachdb pods replace cockroachdb-1
  3. Wait for cockroachdb-1 to be up and healthy before continuing with any other replacement operations.

Restarting a Node

This operation will restart a node while keeping it at its current location and with its current persistent volume data. This may be thought of as similar to restarting a system process, but it also deletes any data that is not on a persistent volume.

  1. Run dcos cockroachdb pods restart cockroachdb-<NUM>, e.g. cockroachdb-2.

Replacing a Node

This operation will move a node to a new system and will discard the persistent volumes at the prior system to be rebuilt at the new system. Perform this operation if a given system is about to be offlined or has already been offlined.

Note: Nodes are not moved automatically. You must perform the following steps manually to move nodes to new systems. You can build your own automation to perform node replacement automatically according to your own preferences.

  1. Run dcos cockroachdb pods replace cockroachdb-<NUM> to halt the current instance (if still running) and launch a new instance elsewhere.

For example, let's say cockroachdb-3's host system has died and cockroachdb-3 needs to be moved.

  1. Start cockroachdb-3 at a new location in the cluster by running:

    $ dcos cockroachdb pods replace cockroachdb-3

Disaster Recovery

Backing up and restoring data are critical pieces of behavior for any stateful application storing important data. The functionality described below ensures you can protect your data against all sorts of disasters.

The behavior included in this DC/OS framework uses standard SQL to dump and restore all of your data, but if you have a very large database and need faster backups, incremental backups, or a faster, distributed restore process, consider contacting Cockroach Labs about an enterprise license.

Backup

Backing up to S3

You can back up a CockroachDB cluster's data on a per-database basis using the dcos cockroachdb backup CLI command, specifying the database name and S3 bucket as arguments. For example, to back up the data from a database named bank to an S3 bucket named cockroachdb-backup, you would run:

dcos cockroachdb backup bank cockroachdb-backup

This will back up all tables contained within the database. For more details on how the data is being backed up, please see the documentation of the underlying cockroach dump command.

You can configure the communication with S3 using the following optional flags to the CLI command:

By default, the AWS access and secret keys will be pulled from your environment via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, respectively. You must either have these environment variables defined or specify the flags for the backup to work.

Make sure that you provision your nodes with enough disk space to perform a backup. The backups are stored on disk before being uploaded to S3, and will take up as much space as the data currently in the tables, so you'll need half of your total available space to be free to backup every keyspace at once.

Restore

Restoring from S3

Restoring cluster data is similar to backing it up. The dcos cockroachdb restore CLI commmand assumes that your data is stored in an S3 bucket in the format that the dcos cockroachdb backup command uses (or, alternatively, the format generated by running the cockroach dump command). The restore command is run like:

dcos cockroachdb restore [<flags>] <database> <s3-bucket> <s3-backup-dir>

And it takes the following optional flags:

By default, the AWS access and secret keys will be pulled from your environment via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables, respectively. You must either have these environment variables defined or specify the flags for the restore to work.

Troubleshooting

Accessing Logs

Logs for the scheduler and all service nodes can be viewed from the DC/OS web interface.

In all cases, logs are generally piped to files named stdout and/or stderr.

To view logs for a given node, perform the following steps:

  1. Visit to access the DC/OS web interface.
  2. Navigate to Services and click on the service to be examined (default cockroachdb).
  3. In the list of tasks for the service, click on the task to be examined (scheduler is named after the service, nodes are cockroachdb-0-node-init or cockroachdb-#-node-join).
  4. In the task details, click on the Logs tab to go into the log viewer. By default, you will see stdout, but stderr is also useful. Use the pull-down in the upper right to select the file to be examined.

You can also access the logs via the Mesos UI:

  1. Visit /mesos to view the Mesos UI.
  2. Click the Frameworks tab in the upper left to get a list of services running in the cluster.
  3. Navigate into the correct framework for your needs. The scheduler runs under marathon with a task name matching the service name (default cockroachdb). Service nodes run under a framework whose name matches the service name (default cockroachdb).
  4. You should now see two lists of tasks. Active Tasks are tasks currently running, and Completed Tasks are tasks that have exited. Click the Sandbox link for the task you wish to examine.
  5. The Sandbox view will list files named stdout and stderr. Click the file names to view the files in the browser, or click Download to download them to your system for local examination. Note that very old tasks will have their Sandbox automatically deleted to limit disk space usage.

Limitations

Removing a Node

Removing a node is not supported at this time.

Updating Storage Volumes

Neither volume type nor volume size requirements may be changed after initial deployment.

Rack-aware Replication

Rack placement and awareness are not supported at this time.

Backup Storage

Storage of / restoring from backups in datastores other than S3 is not yet supported.

Enterprise Backup and Restore

The backup and restore functionality included in this DC/OS framework uses standard SQL to dump and restore all of your data, but if you have a very large database and need faster backups, incremental backups, or a faster, distributed restore process, consider contacting Cockroach Labs about an enterprise license.

Supported Versions

Build Instructions

Since this framework was migrated from a branch off dcos-commons, you'll need a copy of that repository to build it (until the build scripts are modified appropriately). Steps to build:

  1. Clone dcos-commons.
  2. Add the following two lines to dcos-commons/settings.gradle:
include 'frameworks/cockroachdb'
project(":frameworks/cockroachdb").name = "cockroachdb"
  1. Clone this repo into dcos-commons/frameworks/.
  2. Use dcos-commons/frameworks/cockroachdb/build.sh to build.

Publishing Instructions

To publish a new version of this package to the DC/OS Universe package manager, follow these steps:

  1. Follow the above build instructions to clone dcos-commons and add the cockroachdb framework to it.
  2. Run S3_BUCKET=<your S3 bucket name> ./frameworks/cockroachdb/build.sh aws from the root of your modified dcos-commons repo. This will build the package and push all the relevant artifacts to your S3 bucket. It doesn't matter what bucket you use -- it's only for temporary storage. We'll use an official one in a later step.
  3. Test the built package by following the instructions build.sh prints to run it in a DC/OS cluster you own.
  4. Fork the Universe repository.
  5. Run the following command to copy the release build and configuration into a release S3 bucket:
    GITHUB_TOKEN=<your-github-access-token> RELEASE_UNIVERSE_REPO=<your-fork-of-mesosphere/universe> S3_RELEASE_BUCKET=dcos-cockroachdb HTTP_RELEASE_SERVER=https://dcos-cockroachdb.s3.amazonaws.com MIN_DCOS_RELEASE=1.9 RELEASE_DIR_PATH=dcos/release ./tools/release_builder.py X.Y.Z-X.Y.Z <url-from-previous-step> Update CockroachDB package to vX.Y.Z

    The URL that you want to use from the output of the build.sh step is the S3 URL for the stub-universe-cockroachdb.json file, which should be towards the bottom of the output. For example, a-robinson ran the following command to update the CockroachDB package to v1.1.4:

    GITHUB_TOKEN=<omitted> RELEASE_UNIVERSE_REPO=a-robinson/universe S3_RELEASE_BUCKET=dcos-cockroachdb HTTP_RELEASE_SERVER=https://dcos-cockroachdb.s3.amazonaws.com MIN_DCOS_RELEASE=1.9 RELEASE_DIR_PATH=dcos/release ./tools/release_builder.py 1.1.4-1.1.4 https://alex-dcos-exhibitors3bucket-gx95x7buf8qx.s3.amazonaws.com/autodelete7d/cockroachdb/20180110-223034-FhU2xXR0jHocfp8S/stub-universe-cockroachdb.json Update CockroachDB package to v1.1.4
  6. Assuming all goes well, this will push a new branch to your fork of the mesosphere/universe repo. Find the branch that it pushed (which should be easy since it also opens a PR against your fork) and open a PR against the upstream repo using that branch.

These steps have only been tested against commit f58f3b609f466c8cfd351fa7c38655055a358663 of dcos-commons, so if you hit problems along the way, consider retrying at that SHA. It may just be an issue with my local python version or the version of the repo I'm at, but I also had to replace urllib.request.URLopener().retrieve(src_url, local_path) with urllib.request.urlretrieve(src_url, local_path) on line 293 of tools/release_builder.py to get the release upload process to work.