atlassian / gostatsd

An implementation of Etsy's statsd in Go with tags support
MIT License
383 stars 99 forks source link
statsd

gostatsd

Godoc Build Status Coverage Status GitHub tag Docker Pulls Docker Stars MicroBadger Layers Size Go Report Card license

An implementation of Etsy's statsd in Go, based on original code from @kisielk.

The project provides both a server called "gostatsd" which works much like Etsy's version, but also provides a library for developing customized servers.

Backends are pluggable and only need to support the backend interface.

Being written in Go, it is able to use all cores which makes it easy to scale up the server based on load.

Building the server

[](# Go version needs to be the same in: CI config, README, Dockerfiles, and Makefile) Gostatsd currently targets Go 1.21.3. If you are compiling from source, please ensure you are running this version.

From the gostatsd directory run make build. The binary will be built in build/bin/<arch>/gostatsd.

You will need to install the Golang build dependencies by running make setup in the gostatsd directory. This must be done before the first build, and again if the dependencies change. A protobuf installation is expected to be found in the tools/ directory. Managing this in a platform agnostic way is difficult, but PRs are welcome. Hopefully it will be sufficient to use the generated protobuf files in the majority of cases.

If you are unable to build gostatsd please check your Go version, and try running make setup again before reporting a bug.

Running the server

gostatsd --help gives a complete description of available options and their defaults. You can use make run to run the server with just the stdout backend to display info on screen.

You can also run through docker by running make run-docker which will use docker-compose to run gostatsd with a graphite backend and a grafana dashboard.

While not generally tested on Windows, it should work. Maximum throughput is likely to be better on a linux system, however.

The server listens for UDP packets by default. You can use unix sockets providing an absolute path to the socket in the metrics-addr configuration option. The socket mode used in this case is SOCK_DGRAM. Note that using unix sockets will only work on linux and that it will ignore conn-per-reader configuration option.

Configuring the server mode

The server can currently run in two modes: standalone and forwarder. It is configured through the top level server-mode configuration setting. The default is standalone.

In standalone mode, raw metrics are processed and aggregated as normal, and aggregated data is submitted to configured backends (see below)

This configuration mode allows the following configuration options:

In forwarder mode, raw metrics are collected from a frontend, and instead of being aggregated they are sent via http to another gostatsd server after passing through the processing pipeline (cloud provider, static tags, filtering, etc).

A forwarder server is intended to run on-host and collect metrics, forwarding them on to a central aggregation service. At present the central aggregation service can only scale vertically, but horizontal scaling through clustering is planned.

Aligned flushing is deliberately not supported in forwarder mode, as it would impact the central aggregation server due to all for forwarder nodes transmitting at once, and the expectation that many forwarding flushes will occur per central flush anyway.

Configuring forwarder mode requires a configuration file, with a section named http-transport. The raw version spoken is not configurable per server (see HTTP.md for version guarantees). The configuration section allows the following configuration options:

The following settings from the previous section are also supported:

Running as a Lambda Extension (experimental feature)

Gostatsd can be run as a lambda extension in forwarder mode. The metrics are flushed at the end of each lambda invocation by default. The flush interval is ignored for your custom metrics, internal metrics are still flushed on a best effort basis using the configured flush interval.

To support flushes based on the runtime function, a lambda telemetry server is started at the reserved lambda hostname sandbox on port 8083. This can be configured by setting the lambda-extension-telemetry-address configuration parameter. This will need to be done if port 8083 is not available within the lambda runtime.

The flush per invocation setting can be disabled by setting lambda-extension-manual-flush to false, however this is not recommended unless the lambda is constantly invoked. Since the extensions are suspended once the user lambda functions return, this may lead to metric loss (for inflight requests) or metric delay until the next invocation in lambdas that are sparsely invoked.

Configurable options:

All options for specified in the previous section for the forwarder are also configurable with the following caveats:

Metric expiry and persistence

After a metric has been sent to the server, the server will continue to send the metric to the configured backend until it expires, even if no additional metrics are sent from the client. The value sent depends on the metric type:

Setting an expiry interval of 0 will persist metrics forever. If metrics are not carefully controlled in such an environment, the server may run out of memory or overload the backend receiving the metrics. Setting a negative expiry interval will result in metrics not being persisted at all.

Each metric type has its own interval, which is configured using the following precedence (from highest to lowest): expiry-interval-<type> > expiry-interval > default (5 minutes).

Configuring HTTP servers

The service supports multiple HTTP servers, with different configurations for different requirements. All http servers are named in the top level http-servers setting. It should be a space separated list of names. Each server is then configured by creating a section in the configuration file named http.<servername>. An http server section has the following configuration options:

For example, to configure a server with a localhost only diagnostics endpoint, and a regular ingestion endpoint that can sit behind an ELB, the following configuration could be used:

backends='stdout'
http-servers='receiver profiler'

[http.receiver]
address='0.0.0.0:8080'
enable-ingestion=true

[http.profiler]
address='127.0.0.1:6060'
enable-expvar=true
enable-prof=true

There is no capability to run an https server at this point in time, and no auth (which is why you might want different addresses). You could also put a reverse proxy in front of the service. Documentation for the endpoints can be found under HTTP.md

Configuring backends

Refer to backends for configuration options for the backends.

Cloud providers

Cloud providers are a way to automatically enrich metrics with metadata from a cloud vendor.

Refer to cloud providers for configuration options for the cloud providers.

Configuring timer sub-metrics

By default, timer metrics will result in aggregated metrics of the form (exact name varies by backend):

<base>.Count
<base>.CountPerSecond
<base>.Mean
<base>.Median
<base>.Lower
<base>.Upper
<base>.StdDev
<base>.Sum
<base>.SumSquares

In addition, the following aggregated metrics will be emitted for each configured percentile:

<base>.Count_XX
<base>.Mean_XX
<base>.Sum_XX
<base>.SumSquares_XX
<base>.Upper_XX - for positive only
<base>.Lower_-XX - for negative only

These can be controlled through the disabled-sub-metrics configuration section:

[disabled-sub-metrics]
# Regular metrics
count=false
count-per-second=false
mean=false
median=false
lower=false
upper=false
stddev=false
sum=false
sum-squares=false

# Percentile metrics
count-pct=false
mean-pct=false
sum-pct=false
sum-squares-pct=false
lower-pct=false
upper-pct=false

By default (for compatibility), they are all false and the metrics will be emitted.

Timer histograms (experimental feature)

Timer histograms inspired by Prometheus implementation can be enabled on a per time series basis using gsd_histogram meta tag with value containing histogram bucketing definition (joined with _) e.g. gsd_histogram:-10_0_2.5_5_10_25_50.

It will:

For timer with gsd_histogram:-10_0_2.5_5_10_25_50 meta tag, following time series will be generated

Each time series will contain a total number of timer data points that had a value less or equal le value, e.g. counter <base>.histogram with the tag le:5 will contain the number of all observations that had a value not bigger than 5. Counter <base>.histogram with tag le:+Inf is equivalent to <base>.count and contains the total number.

All original timer tags are preserved and added to all the time series.

To limit cardinality, timer-histogram-limit option can be specified to limit the number of buckets that will be created (default is math.MaxUint32). Value of 0 won't disable the feature, 0 buckets will be emitted which effectively drops metrics with gsd_hostogram tags.

Incorrect meta tag values will be handled in best effort manner, i.e.

This is an experimental feature and it may be removed or changed in future versions.

Load testing

There is a tool under cmd/loader with support for a number of options which can be used to generate synthetic statsd load. There is also another load generation tool under cmd/tester which is deprecated and will be removed in a future release.

Help for the loader tool can be found through --help.

Sending metrics

The server listens for UDP packets on the address given by the --metrics-addr flag, aggregates them, then sends them to the backend servers given by the --backends flag (space separated list of backend names).

Currently supported backends are:

The format of each metric is:

<bucket name>:<value>|<type>\n

A single packet can contain multiple metrics, each ending with a newline.

Optionally, gostatsd supports sample rates (for simple counters, and for timer counters) and tags:

Tags format is: simple or key:value.

A simple way to test your installation or send metrics from a script is to use echo and the netcat utility nc:

echo 'abc.def.g:10|c' | nc -w1 -u localhost 8125

Monitoring

Many metrics for the internal processes are emitted. See METRICS.md for details. Go expvar is also exposed if the --profile flag is used.

Memory allocation for read buffers

By default gostatsd will batch read multiple packets to optimise read performance. The amount of memory allocated for these read buffers is determined by the config options:

max-readers * receive-batch-size * 64KB (max packet size)

The metric avg_packets_in_batch can be used to track the average number of datagrams received per batch, and the --receive-batch-size flag used to tune it. There may be some benefit to tuning the --max-readers flag as well.

Using the library

In your source code:

import "github.com/atlassian/gostatsd/pkg/statsd"

Note that this project uses Go modules for dependency management.

Documentation can be found via go doc github.com/atlassian/gostatsd/pkg/statsd or at https://godoc.org/github.com/atlassian/gostatsd/pkg/statsd

Versioning

Gostatsd uses semver versioning for both API and configuration settings, however it does not use it for packages.

This is due to gostatsd being an application first and a library second. Breaking API changes occur regularly, and the overhead of managing this is too burdensome.

Contributors

Pull requests, issues and comments welcome. For pull requests:

See the existing issues for things to start contributing.

For bigger changes, make sure you start a discussion first by creating an issue and explaining the intended change.

Atlassian requires contributors to sign a Contributor License Agreement, known as a CLA. This serves as a record stating that the contributor is entitled to contribute the code/documentation/translation to the project and is willing to have it used in distributions and derivative works (or is willing to transfer ownership).

Prior to accepting your contributions we ask that you please follow the appropriate link below to digitally sign the CLA. The Corporate CLA is for those who are contributing as a member of an organization and the individual CLA is for those contributing as an individual.

License

Copyright (c) 2012 Kamil Kisiel. Copyright @ 2016-2020 Atlassian Pty Ltd and others.

Licensed under the MIT license. See LICENSE file.