Tools for managing large consistent hashing Graphite clusters.
Buckminster (Bucky) Fuller once said,
If you want to teach people a new way of thinking, don’t bother trying to teach them. Instead, give them a tool, the use of which will lead to new ways of thinking.
Carbon molecules known as fullerenes were also named after Bucky due to their geodesic arrangement of atoms. So, this is my contribution to the Graphite naming scheme.
When working large consistent hashing Graphite clusters doing simple maintenance can involve a lot of data moving around. Normally, one would reach for the Carbonate tools:
https://github.com/jssjr/carbonate
These are good tools and I highly recommend them.
However, when the terabytes of WSP files and the number of storage nodes stack up you start to find scaling problems:
So I wanted to use the speed and concurrency of Go to build more efficient tools to help manage large consistent hashing cluster.
These are the tools included and their functionality.
whisper-fill
compatible utility that is nearly
an order of magnitude faster..wsp
files into sparse files.The heavy lifting commands use a set of worker threads to do IO work which can be set at the command line with -w.
These tools assume the following are true:
REPLICATION_FACTOR
of 1These aren't set in stone, just what I was working with as I built the tool. I very much hope that some of these will be solved with further development.
Each data storage node in the Graphite cluster needs a buckyd daemon running as the same user as the other Graphite tools. I use an Upstart job to keep mine running. The important bit here is that you must pass to the daemon as arguments the members of the consistent hash ring.
$ cat /etc/init/buckyd.conf
description "Buckyd Daemon for Managing Graphite Clusters"
author "Jack Neely <jjneely@42lines.net>"
setuid graphite
exec /path/to/buckyd -node graphite010-g5 \
-sparse -hash carbon -b 192.168.1.1:5678 \
graphite010-g5:a graphite010-g5:b \
graphite011-g5:a graphite011-g5:b \
graphite012-g5:a graphite012-g5:b
Here -node
is the name of this Graphite node in the hashring (if different
from what is derived from the host name). -b
or -bind
is the address to
bind to. You can also specify -prefix
where your Whisper data store is and
-tmpdir
where the daemon can write temporary files. The -sparse
option
instructs buckyd to create sparse whisper files that take less disk space.
The -hash
option chooses the hashring algorithm.
The non-option arguments are the servers and instances that make up the hashring. Order is important. The hashring members can be specified in the following formats:
SERVER
SERVER:INSTANCE
SERVER:PORT:INSTANCE
This exposes a REST API that is documented in REST_API_NOTES.md.
For accessl control, you can enable an JWT API token validation
with flag -auth-jwt-secret-file /path/to/jwt/secret/file
.
For bucky tools to interact with buckyd with API token as access control, an API token produced by the jwt secret is required, by using tools like jwt-cli or a simple go program (check cmd/buckyd/metrics_test.go for reference).
Parameters that we can specify in the JWT token:
{
"namespaces": ["sys.*", "app.api.*"],
"ops": ["read", "update", "replace", "delete"]
}
Namespaces could be a valid Graphite globbing query, or just a dot separated prefix/namespace like "sys.*"
.
Operations could be a compbination of the four values: "read", "update", "replace", "delete"
.
For root level access, the paramters could be specified as, this is an token scope should be used by bucky tools to rebalance clusters:
{
"namespaces": ["*"],
"ops": ["*"]
}
An example of using jwt-cli to produce a valid buckyd API token:
jwt encode --secret="xxx" '{"namespaces": ["*"], "ops": ["*"]}'
NOTE: The payload needs to be valid JSON.
The bucky tool is self documenting. You can run:
bucky help
to see a list of modules and available flags and what arguments are needed. Detailed help is available by specifying a module name:
bucky help backfill
Most commands need a --host
or -h
flag to specify the initial Graphite
host to connect to where the client will discover the entire hash ring.
You can also set the BUCKYHOST
environment variable rather than
specify this flag for each command.
Other common flags are:
-s
Operate only on the initial Graphite host.-f
Requests the remote daemon to refresh its cache of local metrics.-j
Read from STDIN or dump to STDOUT JSON data rather than text.-r
Regular expression mode.-w
Number of worker threads.Rebalance a cluster with newly added storage nodes. Check if you need to
use the -no-delete
flag. The default behavior is to move metrics and
delete the source after a successful copy.
$ bucky rebalance -h graphite010-g5:4242 \
-w 5 2>&1 | tee rebalance.log
Discover the exact storage used by a set of metrics:
$ export BUCKYHOST=-h graphite010-g5:4242
$ bucky du -r '^1min\.ipvs\.'
Make a backup of all of the metrics in the carbon
namespace. Using the
pigz parallel gzip compression tool. (Normal gzip would otherwise bottleneck
the process.)
$ bucky tar -w 5 -r '^carbon\.' | pigz > filename.tar
Backfill or rename metrics with a JSON hash of old name to new name. This does not delete the source metric. It is a copy/fill operation.
$ bucky backfill -w 5 foo.json
Offload backfilling to data nodes. This should be more performant as it saves one round trip by asking data nodes to copy data directly between each other.
$ bucky backfill -w 5 -h graphite010-g5:4242 -w 5 -offload
Find inconsistent metrics or metrics that are in the wrong place in the cluster according to the hashring:
$ bucky inconsistent
Specify API token:
$ bucky list -h graphite010-g5:4242 -api-token-file /path/to/api.token
To build from the Go source:
GOPATH
environment variable is set to your Go
workspace.go get github.com/jjneely/buckytools
$GOPATH/src/github.com/jjneely/buckytools
go install ./...
$GOPATH/bin
This can also be built as a Debian/Ubuntu package. (Tested on Ubuntu Trusty,
and Xenial.) The git-buildpackage is what I use to produce builds.
This requires golang
debian packages.
gbp buildpackage
The daemon makes no effort to remove possibly empty directories when deleting a metric. This can potentially cause race conditions with carbon-cache.py creating a new metric in a would be deleted directory. Once carbon-cache.py closes the file handle to a file in a deleted directory that file will also be deleted. The delete action must not cause harm to other metrics.
To prune old or empty directories from your Graphite whisper store use a cron job similar to this:
/usr/bin/find ${prefix}/storage/whisper -type d -empty -mtime +1 -delete
This checks that the directory has not been modified in more than 1 day which, in most cases, avoids race conditions.
Modify command supports two operations: resize, or update aggregation policy.
Resize mode allows user to resize one archive at a time. It only change the
targeting archive and does not affect other archives. Use -index
to specify
resized archives. Use -retention
to specify new policy (with the same format in whisper
configuration). To resize to bigger time range, modify command upsample data
from lower-resolution archives.
Example:
$ bucky modify -index 1 -retention 1m:30d -f 100_olddata.wsp
Change aggregation policy. Other than changing policy, this command would also try to correct data if it's changing policy from average -> sum, or sum -> average. For other types of changes, it would only do a simple data copy.
Example:
$ bucky modify -f small.wsp.new -agg average
By default, both tool would copy the original whisper file as a new back in the same location.
To further scale the speed at which this tool will move metric data from
one location to another it uses Snappy compression by default. This can be
disabled with the -no-encoding
flag. When using many workers this can
double (or more) the throughput. The Snappy compression frame protocol also
handles CRC checks for data integrity.
Contributions are welcome! Please make a GitHub pull request. Below are some low hanging fruit (and some more annoying issues) that need help.
net/http/httptest
package. Test that the buckyd
daemon manipulates the on disk Whisper files correctly.graphite-project/carbon's master branch contains this change:
https://github.com/graphite-project/carbon/commit/024f9e67ca47619438951c59154c0dec0b
This will cause a few metrics to be assigned a different position in the hash ring. We need to account for this algorithm change somehow.
Buckytools supports multiple different hashing algorithms and this can be setup as a different support hashing type.