Setting up continuous query and retention policy for downsampling with existing data in InfluxDB 1.x.
Note
I'm using InfluxDB 1.8 because starting from version 2.0 there is no ARM 32bit builds. Also, version 1.x is much simpler than 2.0.
However, things should be similar with InfluxDB 2.0 tasks.
The InfluxDB documentation includes a great guide for downsampling and retaining data. However, if you are not starting from an empty database, the guide is not enough.
I have all the data from the sensors in the database sensors. The data looks like this:
And I want to downsample the data into another database called sensors-history:
CREATE DATABASE sensors_history;
Note
You can just create another retention policy instead of creating a new database, and you just need to change the queries below from "sensors-history"."autogen" to "sensors"."oneyear" where oneyear is the name of the retention policy you just created.
Downsample all existing data
Before creating a retention policy on sensors, we first need to downsample all existing data into sensors-history:
USE sensors;
SELECT MEAN(*) INTO "sensors_history"."autogen".:MEASUREMENT
FROM /.*/ GROUP BY time(10m),*;
Note
I added newlines in the query for readability. If you are executing the queries from the influx REPL, you need to remove the newlines for each query.
The wildcard and the backreferencing syntax :MEASUREMENT can be found in the InfluxQL documentation. All it does is to select all data from all measurements (the regex /.*/ part), downsample them with the interval of 10m, and write to the corresponding :MEASUREMENTs in sensors-history database with the autogen retention policy, preserving all the tags (GROUP BY * part).
If you want more details, you can do this:
SELECT
MEAN("value") AS mean_value,
PERCENTILE("value",95) AS high_value,
PERCENTILE("value",5) AS low_value
INTO "sensors_history"."autogen".:MEASUREMENT
FROM /.*/
GROUP BY time(10m),*;
where I used the PERCENTILE function instead of MAX and MIN for a more robust aggregation. As a consequence, I have to use AS for aliasing, as the default query naming fails to distinguish the two percentile functions.
Setting up a retention policy
After checking that all data is downsampled into sensors-history, it's time to set up the retention policy and delete old data. Here I will just keep the data for one day:
where REPLICATION 1 is the default for single-node InfluxDB deployment.
Since I've set the new policy oneday as the default, the query will return an empty result if the retention policy is not specified in the query. In other words, it may look like you've lost all your data after setting a retention policy.
To make the query return recent 24h data, I need to move the recent data from old autogen to oneday:
SELECT * INTO "oneday".:MEASUREMENT
FROM "autogen"./.*/
WHERE time > now() - 1d GROUP BY *;
After checking everything is working fine, it's time to remove old data.
DELETE FROM /.*/ WHERE time < now() - 1d;
Creating the continuous query
The last step is to create the continuous query:
CREATE CONTINUOUS QUERY "cq_tidy_sensors" ON "sensors"
BEGIN
SELECT
MEAN("value") AS mean_value,
PERCENTILE("value",95) AS high_value,
PERCENTILE("value",5) AS low_value
INTO "sensors_history"."autogen".:MEASUREMENT
FROM "sensors"."oneday"./.*/
GROUP BY time(10m),*;
END
The SELECT query is similar to the query above, but it's better to specify the retention policy oneday, especially if you didn't make it the default.
The continuous query will automatically execute every 10m, selecting recent 10m of data from the sensors database to sensors-history. You can check your continuous query by running
SHOW CONTINUOUS QUERIES;
If everything goes fine, you will see something like this every 10m in your InfluxDB log (docker compose logs if using Docker Compose):
msg="Continuous query execution (start)" service=continuous_querier op_name=continuous_querier_execute op_event=start
msg="Executing continuous query" service=continuous_querier op_name=continuous_querier_execute name=cq_tidy_sensors db_instance=sensors start=2023-05-03T03:00:00.000000Z end=2023-05-03T03:10:00.000000Z
msg="Executing query" service=query query="SELECT mean(value) AS mean_value, percentile(value, 95) AS high_value, percentile(value, 5) AS low_value INTO sensors_history.autogen.:MEASUREMENT FROM sensors.oneday./.*/ WHERE time >= '2023-05-03T03:00:00Z' AND time < '2023-05-03T03:10:00Z' GROUP BY time(10m), *"
msg="Finished continuous query" service=continuous_querier op_name=continuous_querier_execute name=cq_tidy_sensors db_instance=sensors written=7 start=2023-05-03T03:00:00.000000Z end=2023-05-03T03:10:00.000000Z duration=1069ms
msg="Continuous query execution (end)"
View Post on Blog
The InfluxDB documentation includes a great guide for downsampling and retaining data. However, if you are not starting from an empty database, the guide is not enough.
I have all the data from the sensors in the database
sensors
. The data looks like this:And I want to downsample the data into another database called
sensors-history
:Downsample all existing data
Before creating a retention policy on
sensors
, we first need to downsample all existing data intosensors-history
:The wildcard and the backreferencing syntax
:MEASUREMENT
can be found in the InfluxQL documentation. All it does is to select all data from all measurements (the regex/.*/
part), downsample them with the interval of10m
, and write to the corresponding:MEASUREMENT
s insensors-history
database with theautogen
retention policy, preserving all the tags (GROUP BY *
part).If you want more details, you can do this:
where I used the
PERCENTILE
function instead ofMAX
andMIN
for a more robust aggregation. As a consequence, I have to useAS
for aliasing, as the default query naming fails to distinguish the two percentile functions.Setting up a retention policy
After checking that all data is downsampled into
sensors-history
, it's time to set up the retention policy and delete old data. Here I will just keep the data for one day:where
REPLICATION 1
is the default for single-node InfluxDB deployment.Since I've set the new policy
oneday
as the default, the query will return an empty result if the retention policy is not specified in the query. In other words, it may look like you've lost all your data after setting a retention policy.To make the query return recent
24h
data, I need to move the recent data from oldautogen
tooneday
:After checking everything is working fine, it's time to remove old data.
Creating the continuous query
The last step is to create the continuous query:
The
SELECT
query is similar to the query above, but it's better to specify the retention policyoneday
, especially if you didn't make it the default.The continuous query will automatically execute every
10m
, selecting recent10m
of data from thesensors
database tosensors-history
. You can check your continuous query by runningIf everything goes fine, you will see something like this every
10m
in your InfluxDB log (docker compose logs
if using Docker Compose):🥳 We have set up downsampling with existing data!