Open rmoff opened 5 years ago
+1 from a user in an offline conversation at a meetup in Munich.
Feedback was: "Have the ability to specify either the desired time zone or a time offset for windowed aggregations"
+1 from an offline user conversation, where the goal is to do windowed aggregations where windows start at midnight Eastern time.
This should be addressed in Kafka Streams first IMHO -- the feature request exist there, too. Afterwards, KSQL can expose it to the users.
Agreed @mjsax
Looks like there's a workaround in Kafka Streams using custom TimestampExtractor
s so it is an option to support this in KSQL, though it'd be somewhat hacky.
That's super hacky and I would not recommend it. Why not just work on https://issues.apache.org/jira/browse/KAFKA-7911 directly? Also note, that there is https://github.com/confluentinc/kafka-streams-examples/blob/5.3.1-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java that we could leverage.
Thanks for the update @mjsax . I agree those are both much better options.
We should also support MONTH
and YEAR
for window sizes - it sounds like these would fall into this same bucket of work?
This would be greatly needed. For now I'm doing a table not windowed that groups by the date (modified to be like 2023-xx-01) for all dates of a given month. It works, BUT I don't have the control on retention on the ksqlDB side and it grows indefinitely...
Per StackOverflow question, if calculating aggregates with KSQL at day-level and above (day/week/month/etc) it would be useful to be able to do these based on the local timezone.
KSQL windows are based on the epoch (number of milliseconds since Jan 01 1970, UTC).
For aggregates at a second/minute/hour level this doesn't really matter, but for an organisation wanting to report on "yesterday" etc then the window would need to include events within the 24 hour time window based on the specified timezone, not necessarily UTC.
One idea is if KSQL supported a
DATEADD
function, where the event timestamp could be shifted by the required hours to match the local timezone relative to UTC, and then that used as the timestamp for aggregations. However, this feels like a bit of a hack—open to better suggestions for doing this properly.