DrPsychick / ansible_influx_downsampling

Ansible role to setup downsampling with continuous queries on influxDB, optionally with backfilling and compaction of existing raw data.
GNU General Public License v3.0
17 stars 4 forks source link
aggregation ansible-role backfill continuous-queries downsampling influxdb

DrPsychick.ansible_influx_downsampling

Build Status license Paypal GitHub Sponsor

Configure influxDB for downsampling

Motivation:

InfluxDB uses a default retention policy that keeps data forever in 7 day shards - in RAW format (data points every 10 or 30 seconds, depending on your input configuration). Of course this is a good default, but once you have old data and want to introduce downsampling without loosing data, its a lot of manual work to setup all the queries etc.

So ... I have done this for you!

Two usage scenarios:

Honestly the two use cases are not much different. The biggest difference is the time it takes to run through the playbook when you enable backfilling. Of course, if you work on existing data, don't forget to have a proper backup!

Demo

See the result in action:

Watch the demo

This requires to setup multiple datasources in grafana (one for each aggregation level) and a "Datasource" variable, like so:

Setup database variable

Preparation

As preparation you don't need much, except knowing how exactly you want to downsample your data as you need to setup your configuration first. Once configured, you can run each level separately and/or repeatedly without any issue as the role only creates/adds what is missing (ok, CQs are always recreated as there is no way to alter them). This way you can iterate your way towards a final setup and add compaction as last step, once you're happy with the result.

Levels of downsampling

Setup

Easiest setup is create a role in your own repository and adding this:

roles/influx-setup/tasks/main.yml

- name: "Include definition from influxdb_{{vars_name}}.yml"
  include_vars: influxdb_{{vars_name}}.yml
    when: vars_name is defined

# you can add your own tasks here that will be executed prior to `ansible_influx_downsampling` role

roles/influx-setup/vars/influxdb_frank.yml --> take one from the examples directory as a base for your own: examples/

Now in your playbook, include both roles:

influx-setup.yml

- name: InfluxDB
  hosts: localhost
  roles:
    - { role: influx-setup, vars_name: "frank" }
    - { role: DrPsychick.ansible_influxdb_downsampling}

Configuration

Now the most tricky part. You need to know what aggregation levels you want and define them. Moreover, especially if you don't use standard telegraf plugins for input, you need to define the aggregation queries for your measurements.

InfluxDB has this awesome feature to do dynamic aggregation (SELECT MEAN(*) FROM :MEASUREMENT GROUP BY *), but it automatically prepends mean_ to every field in the measurement and this would mean, you cannot reuse the dashboard you use on your RAW data.

The solution is simple, but requires work. You have to name your columns individually as the original column, idependent of the aggregation method you use.

Example:

SELECT MEAN(usage_user) FROM cpu; # on RAW
SELECT MEAN(usage_user) AS usage_user INTO telegraf_14d.rp_14d.cpu FROM cpu; # in aggregate queries

The good thing about defining the aggregation method individually is that you can mix it and choose what is best for the individual column. And once it's defined, it can be reused for all aggregation levels. Additionally you can choose to add more aggregations (min, mean and max) and have a dashboard which includes them.

SELECT LAST(string_value) AS string_value # for strings
SELECT MAX(users) AS users # for metrics where you're interested in the MAX
...

The configuration goes into your vars file or you can choose to setup global vars for it. If you use generic input plugins from telegraf or other typical sources, please add them to defaults/main.yml and send me a pull request, so others can profit from them too.

Query definition or override:

my_ansible_influx_queries:
  # override default settings, use german performance counter names
  win_cpu: >
    SELECT mean("Benutzerzeit_(Percent)") AS "Benutzerzeit_(Percent)"
    , mean("DPC-Zeit_(Percent)") AS "DPC-Zeit_(Percent)"
    , mean("Interruptzeit_(Percent)") AS "Interruptzeit_(Percent)"
    , mean("Leerlaufzeit_(Percent)") AS "Leerlaufzeit_(Percent)"
    , mean("Privilegierte_Zeit_(Percent)") AS "Privilegierte_Zeit_(Percent)"
    , mean("Prozessorzeit_(Percent)") AS "Prozessorzeit_(Percent)"

A complete setup can be found in examples/full-5level-backfill-compact/

Run time options

Attention

If you enable backfill:

If you enable backfill and compact:

Results of my full setup can be found in examples/full-5level-backfill-compact/

Use Cases

History

Future Version:

Version 0.3: Complete incl. automatic compaction, tests and good examples.

Version 0.2: Fully working and tested. No deleting of data. Stats + CQ update.

Version 0.1: