Nerve is a utility for tracking the status of machines and services. It runs locally on the boxes which make up a distributed system, and reports state information to a distributed key-value store. At Airbnb, we use Zookeeper as our key-value store. The combination of Nerve and Synapse make service discovery in the cloud easy!
We already use Synapse to discover remote services. However, those services needed boilerplate code to register themselves in Zookeeper. Nerve simplifies underlying services, enables code reuse, and allows us to create a more composable system. It does so by factoring out the boilerplate into it's own application, which independently handles monitoring and reporting.
Beyond those benefits, nerve also acts as a general watchdog on systems. The information it reports can be used to take action from a centralized automation center: action like scaling distributed systems up or down or alerting ops or engineering about downtime.
To download and run the nerve binary, first install a version of ruby. Then, install nerve with:
$ mkdir -p /opt/smartstack/nerve
# If you want to install specific versions of dependencies such as an older
# version of the aws-sdk, the docker-api, etc, gem install that here *before*
# gem installing nerve. This is also where you would gem install
# custom reporters.
# If you are on Ruby 2.X use --no-document instead of --no-ri --no-rdoc
$ gem install nerve --install-dir /opt/smartstack/nerve --no-ri --no-rdoc
This will download nerve and its dependencies into /opt/smartstack/nerve. You
might wish to omit the --install-dir
flag to use your system's default gem
path, however this will require you to run gem install nerve
with root
permissions. You can also install via bundler, but keep in mind you'll pick up
Nerve's version of library dependencies and possibly not the ones you need
for your infra/apps.
You can now run the nerve binary like:
export GEM_PATH=/opt/smartstack/nerve
/opt/smartstack/nerve/bin/nerve --help
Nerve depends on a single configuration file, in json format.
It is usually called nerve.conf.json
.
An example config file is available in example/nerve.conf.json
.
The config file is composed of two main sections:
instance_id
: the name nerve will submit when registering services; makes debugging easierheartbeat_path
: a path to a file on disk to touch as nerve makes progress. This allows you to work around https://github.com/zk-ruby/zk/issues/50 by restarting a stuck nerve.services
: the hash (from service name to config) of the services nerve will be monitoringservice_conf_dir
: path to a directory in which each json file will be interpreted as a service with the basename of the file minus the .json extensionEach service that nerve will be monitoring is specified in the services
hash.
The key is the name of the service, and the value is a configuration hash telling nerve how to monitor the service.
The configuration contains the following options:
host
: the default host on which to make service checks; you should make this your public ip to ensure your service is publicly accessibleport
: the default port for service checks; nerve will report the host
:port
combo via your chosen reporterreporter_type
: the mechanism used to report up/down information; depending on the reporter you choose, additional parameters may be required. Defaults to zookeeper
check_interval
: the frequency with which service checks will be initiated; defaults to 500ms
check_mocked
: whether or not health check is mocked, the host check always returns healthy and report up when the value is truechecks
: a list of checks that nerve will perform; if all of the pass, the service will be registered; otherwise, it will be un-registeredrate_limiting
(optional): a hash containing the configuration for rate limiting (see 'Rate Limiting' below)weight
(optional): a positive integer weight value which can be used to affect the haproxy backend weighting in synapse.haproxy_server_options
(optional): a string containing any special haproxy server options for this service instance. For example if you wanted to set a service instance as a backup.labels
(optional): an object containing user-defined key-value pairs that describe this service instance. For example, you could label service instances with datacenter information.Rate limiting is configured in the rate_limiting
hash. If enabled, rate limiting is done via the Token-Bucket algorithm.
That hash contains the following values:
shadow_mode
(optional): shadow mode emits metrics/logs for rate limiting, but does not actually throttle requests (defaults to true
). Set to false
to throttle requests.average_rate
(optional): enforced average rate limit for reporting (defaults to infinity
)max_burst
(optional): enforced maximum burst for reporting (defaults to infinity
)If you set your reporter_type
to "zookeeper"
you should also set these parameters:
zk_hosts
: a list of the zookeeper hosts comprising the ensemble that nerve will submit registration tozk_path
: the path (or znode) where the registration will be createduse_path_encoding
: optional flag to turn on path encoding optimization, the canonical config data at host level (e.g. ip, port, az) is encoded using json base64 and written as zk child name, the zk child data will still be written for backward compatibilitynode_type
: the type of znode that nerve will register as. The available types are ephemeral_sequential
, persistent_sequential
, persistent
, and ephemeral
. If not specified, nerve will create the znode as ephemeral_sequential
type by defaultttl_seconds
: repeatedly 'touch' the created node at this interval in order to update the mtime
. If nil (the default), it will not perform this periodic update.Note: Etcd support is currently experimental!
If you set your reporter_type
to "etcd"
you should also set these parameters:
etcd_host
: etcd host that nerve will submit registration toetcd_port
: port to connect to etcd.etcd_path
: the path where the registration will be created; nerve will create a node with a 30s ttl that is the registration as a child of this path, and then update it every few secondsThe core of nerve is a set of service checks. Each service can define a number of checks, and all of them must pass for the service to be registered. Although the exact parameters passed to each check are different, all take a number of common arguments:
type
: (required) the kind of check; you can see available check types in the lib/nerve/service_watcher
dir of this reponame
: (optional) a descriptive, human-readable name for the check; it will be auto-generated based on the other parameters if not specifiedhost
: (optional) the host on which the check will be performed; defaults to the host
of the service to which the check belongsport
: (optional) the port on which the check will be performed; like host
, it defaults to the port
of the servicetimeout
: (optional) maximum time the check can take; defaults to 100ms
rise
: (optional) how many consecutive checks must pass before the check is considered passing; defaults to 1fall
: (optional) how many consecutive checks must fail before the check is considered failing; defaults to 1If you would like to run a custom check but don't feel like trying to get it merged into this project, there is a mechanism for including external checks thanks to @bakins (airbnb/nerve#36).
Build your custom check as a separate gem and make sure to bundle install
it on your system.
Ideally, you should name your gem "nerve-watcher-#{type}"
, as that is what nerve will require
on boot.
However, if you have a custom name for your gem, you can specify that in the module
argument to the check.
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)