Windowed aggregations: calculate daily aggregates based on time zones other than UTC

confluentinc / ksql

The database purpose-built for stream processing applications.

https://ksqldb.io

Other

89 stars 1.04k forks source link

Windowed aggregations: calculate daily aggregates based on time zones other than UTC #1968

Open rmoff opened 5 years ago

rmoff commented 5 years ago

Per StackOverflow question, if calculating aggregates with KSQL at day-level and above (day/week/month/etc) it would be useful to be able to do these based on the local timezone.

KSQL windows are based on the epoch (number of milliseconds since Jan 01 1970, UTC).

For aggregates at a second/minute/hour level this doesn't really matter, but for an organisation wanting to report on "yesterday" etc then the window would need to include events within the 24 hour time window based on the specified timezone, not necessarily UTC.

One idea is if KSQL supported a DATEADD function, where the event timestamp could be shifted by the required hours to match the local timezone relative to UTC, and then that used as the timestamp for aggregations. However, this feels like a bit of a hack—open to better suggestions for doing this properly.

miguno commented 5 years ago

+1 from a user in an offline conversation at a meetup in Munich.

Feedback was: "Have the ability to specify either the desired time zone or a time offset for windowed aggregations"

miguno commented 5 years ago

+1 from an offline user conversation, where the goal is to do windowed aggregations where windows start at midnight Eastern time.

mjsax commented 5 years ago

This should be addressed in Kafka Streams first IMHO -- the feature request exist there, too. Afterwards, KSQL can expose it to the users.

miguno commented 5 years ago

Agreed @mjsax

vcrfxia commented 4 years ago

Looks like there's a workaround in Kafka Streams using custom TimestampExtractors so it is an option to support this in KSQL, though it'd be somewhat hacky.

mjsax commented 4 years ago

That's super hacky and I would not recommend it. Why not just work on https://issues.apache.org/jira/browse/KAFKA-7911 directly? Also note, that there is https://github.com/confluentinc/kafka-streams-examples/blob/5.3.1-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java that we could leverage.

vcrfxia commented 4 years ago

Thanks for the update @mjsax . I agree those are both much better options.

rmoff commented 4 years ago

We should also support MONTH and YEAR for window sizes - it sounds like these would fall into this same bucket of work?

reneveyj commented 1 year ago

This would be greatly needed. For now I'm doing a table not windowed that groups by the date (modified to be like 2023-xx-01) for all dates of a given month. It works, BUT I don't have the control on retention on the ksqlDB side and it grows indefinitely...