Open zhang2014 opened 4 years ago
As a workaround it is possible to use WITH FILL STEP dateDiff('second', now()), now() + INTERVAL 1 HOUR)
to convert interval to seconds (not sure if there is a better way), but of course it would be more convenient to use time interval expressions directly.
This would be a very useful feature for filling gaps in time series data. My use case looks a lot like @zhang2014 's use case:
WITH
INTERVAL 5 minute as granule,
toDateTime('2021-02-19 09:25:00') as from_date,
toDateTime('2021-02-19 09:50:00') as to_date
SELECT subject_id, toStartOfInterval(start_date, granule) as date, sum(count) as value
FROM series_aggregation_five_mins
WHERE project_id = '1ec4fdbb-6783-4dba-b1ab-c7ac4f0859b8'
AND metric_id = 'd2e5123e-6a32-4e4f-98fc-3f2518236b97'
AND start_date >= from_date
AND start_date < to_date
GROUP BY subject_id, date
ORDER BY subject_id ASC, date ASC WITH FILL FROM from_date TO to_date STEP granule
Note that using @stas-sl workaround can work, but not when you start using STEP sizes of 1 day or more: because of DST changes a day is not always a fixed amount of seconds.
An additional issue I have is when filling with a double GROUP BY as in my use case above. The subject_id
is filled with zero UUIDs. Instead, I would like the 'current' subject_id to extrapolate to the filled date.
Current response:
┌───────────────────────────subject_id─┬────────────────date─┬─value─┐
│ d5e8bab7-3a1b-4003-8381-07a825428559 │ 2021-02-19 09:30:00 │ 89 │
│ 00000000-0000-0000-0000-000000000000 │ 2021-02-19 09:35:00 │ 0 │
│ d5e8bab7-3a1b-4003-8381-07a825428559 │ 2021-02-19 09:40:00 │ 38 │
└──────────────────────────────────────┴─────────────────────┴───────┘
Desired response:
┌───────────────────────────subject_id─┬────────────────date─┬─value─┐
│ d5e8bab7-3a1b-4003-8381-07a825428559 │ 2021-02-19 09:30:00 │ 89 │
│ d5e8bab7-3a1b-4003-8381-07a825428559 │ 2021-02-19 09:35:00 │ 0 │
│ d5e8bab7-3a1b-4003-8381-07a825428559 │ 2021-02-19 09:40:00 │ 38 │
└──────────────────────────────────────┴─────────────────────┴───────┘
@PHameete you can probably do that with some array functions, arrayFill
is what you need. Here's a practical example: https://blog.tinybird.co/2021/05/18/tips-7-filling-null-values-on-clickhouse/
Cheers for the pointer I'll give that a try! My current solution is to generate zeroes beforehand for cases where this is relevant (sums and counts) and to then use UNION ALL to merge with the query without fills.
The original remark still holds: using time intervals as step size would be very valuable, particularly for days and larger intervals as then the workaround above will not work due to DST.
DST, yeah, step is the problem, one day has 23 hours :(, one 25.
select toDateTime('2021-11-01 00:00:00', 'America/Los_Angeles') + interval number day as d from numbers(10)
order by d
with fill from toDateTime('2021-11-01 00:00:00', 'America/Los_Angeles') to toDateTime('2021-11-12 00:00:00', 'America/Los_Angeles') step 86400;
┌───────────────────d─┐
│ 2021-11-01 00:00:00 │
│ 2021-11-02 00:00:00 │
│ 2021-11-03 00:00:00 │
│ 2021-11-04 00:00:00 │
│ 2021-11-05 00:00:00 │
│ 2021-11-06 00:00:00 │
│ 2021-11-07 00:00:00 │
│ 2021-11-07 23:00:00 │
│ 2021-11-08 00:00:00 │
│ 2021-11-08 23:00:00 │
│ 2021-11-09 00:00:00 │
│ 2021-11-09 23:00:00 │
│ 2021-11-10 00:00:00 │
└─────────────────────┘
┌───────────────────d─┐
│ 2021-11-10 23:00:00 │
│ 2021-11-11 23:00:00 │
└─────────────────────┘
Hi @den-crane thanks for the reply. I think supporting day or larger periods as step for fills can be a very useful feature because it saves a bit of effort and query complexity.
How I do this now is as follows. For example if I want 60 times a 3 day bucket with the first bucket '2019-01-01 00:00:00':
SELECT arrayJoin(arrayMap(x -> toTimeZone(date_add(DAY, x * 3, toDateTime('2019-01-01 00:00:00')), 'UTC'), range(60))) as bucket_date, 0 as value
This works with DST as well because the date_add function handles this properly.
Then the result of this zeroes SELECT
is UNION ALL
-ed with the result of my normal query and aggregated (for example sum'ed if I am getting counts with zero fills).
Use case
Additional context
DataTypeInterval
ingaps fill step
with expression
andgaps fill expression