Seddryck / Tseesecake

Tseesecake is a lightweight proxy for time-series query engines, supporting multiple database engines and storage providers, with a SQL dialect dedicated to time series
https://seddryck.github.io/Tseesecake
Apache License 2.0
2 stars 1 forks source link

Implicit window aggregation for difference #69

Open Seddryck opened 11 months ago

Seddryck commented 11 months ago

70 is a prerequisite.

Following the same idea than for #68, it should be possible to easily define a query where we make a transformation of the time series to represent the delta with the previous element.

SELECT
    Instant,
    Producer,
    AGGREGATE(Produced)
FROM
    WindEnergy
WHERE
        Instant < '2023-01-01'
BUCKET BY 
    DAY
GROUP BY
    Producer

is returning the following table

Instant Producer Produced
2022-12-28 Green Power Inc. 30
2023-12-29 Green Power Inc. 24.491114
2023-12-30 Green Power Inc. 30
2023-12-31 Green Power Inc. 29.702520
2022-12-28 Future Energy 14.816336
2023-12-29 Future Energy 17.009873
2023-12-30 Future Energy 17.833747
2023-12-31 Future Energy 16.787512

Then the following SQL

SELECT
    Instant,
    Producer,
    VARIATION(Produced)
FROM
    WindEnergy
WHERE
         Instant < '2023-01-01'
BUCKET BY 
    DAY
GROUP BY
    Producer
Instant Producer Produced
2022-12-28 Green Power Inc. NULL
2023-12-29 Green Power Inc. -4.185732
2023-12-30 Green Power Inc. 4.185732
2023-12-31 Green Power Inc. 0
2022-12-28 Future Energy NULL
2023-12-29 Future Energy -0.693749
2023-12-30 Future Energy 6.936893
2023-12-31 Future Energy -5.635899

The query translated to the dialect supported by DuckDB should be

SELECT
    DAY(Instant),
    Producer, 
    MAX(Produced) - LAG(MAX(Produced)) OVER (PARTITION BY Producer ORDER BY DAY(Instant) ASC) as Producedx 
FROM
    WindEnergy
WHERE
    Instant < '2023-01-01'
GROUP BY 
    DAY(Instant), producer

In the CTE, the aggregation MAX is the default aggregation of the measurement 'Produced'