Open big-andy-coates opened 4 years ago
While it should be possible to make WINDOWSTART
and WINDOWEND
available within:
The following places can not so easily be supported:
min(windowStart)
: Streams passes the underlying key, not the Windowed
key to the Aggregator
on windowed aggregations. (The aggregate
calls on TimeWindowedKStream
and SessionWindowedKStream
take an aggregator with K
not Windowed<K>
). Fixing this requires a Streams changefilter
method on SessionWindoedKStream
and TimeWindowedKeyStream
that allows access to the window bounds. This requires a Streams change.Partial fix, (for non-aggregate select expressions), available in https://github.com/confluentinc/ksql/pull/4450
Removing 0.7 release tags as the remaining work is not needed for 0.7 and, likely, requires a Streams change.
To document the related Streams work:
Personally, I feel we first need to finish up:
So my personal plan is to finish up KIP-478 and then tackle KIP-300.
@big-andy-coates , do you think this captures the relevant needs, or do we need to create a different AK Jira issue?
Hey @vvcephei, I'm not sure to be honest.
KIP-300 seems to me to be fixing the limitation that its not possible to build a windowed table from a windowed changelog. So I don't see how that's related.
KAFKA-7777 also doesn't seem to be on-topic.
What's needed is:
Aggregator.apply
call. The key passed is currently the unwindowed key. what is needed is the windowed key.For example, KStreamSessionWindowAggregate
actually creates the windowed key in the line after the aggregate call. But this sessionKey
is exactly what we need to be passed to the aggregate call.
The second thing we'd ideally need is a filter
call available on KGroupedStream
and its table equivalent. However, we can hack around this if we need to, though its not pretty.
It doesn't seem any of the listed items above address either of these items, but I may be missing something.
Aha! Thanks for the clarification, @big-andy-coates . That is indeed different than what I was thinking.
Do you want to create a Kafka Jira ticket to track this desire?
So after the upgrade we aren't able to use the UDAfs as they have been removed, and also the the column names aren't accessible in GROUPBY, WHERE, or HAVING. Is there a workaround for this? Also seems like the fix won't be part of 0.11?
Hi @muneebshahid,
The UDAFs, when they existing, wouldn't have been usable in GROUP BY or WHERE, and I can't think of a use-case for using the old windowStart() and windowEnd() UDAFs in the HAVING clause, if that was supported.
Are you saying you were using the UDAFs in the HAVING clause and have no lost this functionality? Sorry if that is the case. Can you explain your use-case and provide example SQL so we can understand what you're trying to achieve?
Thanks.
Hey @big-andy-coates thank you for the response. And yes we are using them in "HAVING" clause.
Our use case is that we are tracking changes in the last x minutes. For this Hopping window was the most suitable one. And since the messages mostly arrive in order, we are mainly interested in the very first window (the window that extends most in the past from the current moment).
For example, have a look at this script ts is the timestamp
SELECT
TIMESTAMPTOSTRING(MAX(ts), 'HH:mm:ss') AS MTS,
TIMESTAMPTOSTRING(WINDOWSTART(), 'HH:mm:ss') AS WS,
TIMESTAMPTOSTRING(WINDOWEND(),'HH:mm:ss') AS WE,
SUM(faults) as total
FROM
my
WINDOW HOPPING (SIZE 10 SECONDS, ADVANCE BY 2 SECONDS)
GROUP BY machine
then these five windows are emitted for a message.
| MTS | WS | WE | Total 1 | 08:19:00 | 08:18:52 | 08:19:02 | 4 2 | 08:19:00 | 08:18:54 | 08:19:04 | 4 3 | 08:19:00 | 08:18:56 | 08:19:06 | 4 4 | 08:19:00 | 08:18:58 | 08:19:08 | 4 5 | 08:19:00 | 08:19:00 | 08:19:10 | 4
To retain just the first one we are using
HAVING
WINDOWEND() <= MTS + 2*1000;
2*1000 to account for the window advance.
One way for us would be to handle this outside ksql, but that means a lot of useless messages will be consumed, particularly for larger window sizes which then need to be filtered out.
If you have some suggestions then please let me know. Thank you.
Thanks for details of the use-case @muneebshahid.
Sorry this functionality has been temporarily lost. I can appreciate this must be frustrating. Please be assured we are working towards reinstating such functionality with our work towards ksqlDB supporting structured keys.
Hey @big-andy-coates , could you create an AK streams ticket for the needed changes so that we would not forget about it?
+1 from community post requiring HAVING
support for window bounds: https://stackoverflow.com/questions/54231314/how-to-only-keep-the-latest-window-in-ksql
Hi all,
We are experiencing the same restriction on Financial Services domain. Is there any update on the issue? Should we expect a support in short notice?
Thanks.
Any update on when this feature will be added?
Also looking for update on this feature.
Not currently planned on short term road map
Any update on when this feature will be added?
KSQL currently lets you take a non-windowed stream and perform a windowed group by:
Which is essentially grouping by not just
something
, but also implicitly by the window bounds.This might be more correctly written with a Tumbling table function:
Where the Tumbling table function returns one row for each row in
S
, with the addition of thewindowstart
andwindowend
columns. (Note: Hopping and session table functions are also possible, though in the case of the latter the table function would also emit retractions).In a correct SQL model
windowstart
andwindowend
would therefore be available as fields within the selection, e.g.This would allow us to do away with
windowStart()
andwindowEnd()
udafs!!!!!Unfortunately, this is not currently possible. Using the window bounds columns results in an unknown column error.
Nor are the columns available to UDAFs, e.g.
QTT test:
Results in error: