confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
91 stars 1.04k forks source link

Use a deterministic name that is consistent across KSQL servers for aggregation state stores #925

Closed rodesai closed 6 years ago

rodesai commented 6 years ago

Currently, state stores for windowed aggregations (and their underlying topics) are named "KSQL_Agg_Query_" + System.currentTimeMillis(). This name is determined when building the streams topology for the query, which means the name is most likely different any time the topology is built. This means different KSQL servers (or even one server across reboots) won't share a changelog topic, which would cause us to lose data after a rebalance.

big-andy-coates commented 6 years ago

I'm looking at what ACLs are needed on the Kafka cluster for KSQL, and having non-deterministic topic names means you have to give KSQL produce and consume rights on every topic in the cluster - no ideal!

So I'm a big +1 on this - ideally for GA

big-andy-coates commented 6 years ago

Once this is fixed we should update the integration test: SecureIntegrationTest.shouldRunQueryWithChangeLogsOnKafkaClusterWithCorrectAcls() and docs

miguno commented 6 years ago

Question: as a short-term workaround would it be possible to use wildcards for ACLs, assuming that KSQL has deterministic topic prefixing at least?

guozhangwang commented 6 years ago

@rodesai to differentiate different servers, could we use some server ids as suffix?

rodesai commented 6 years ago

@guozhangwang the intention here is to differentiate different queries, not servers. Generating consistent unique query ids across servers is a bigger problem that I'll open another issue for. Lets use this issue to track omitting the timestamp from windowed aggregates as we do for not-windowed aggregates for now.