FundingCircle / jackdaw

A Clojure library for the Apache Kafka distributed streaming platform.
https://fundingcircle.github.io/jackdaw/
BSD 3-Clause "New" or "Revised" License
369 stars 80 forks source link

Support for naming windowed join topics and other internal topics #178

Open kidpollo opened 5 years ago

kidpollo commented 5 years ago

The current stable version of Kafka streams (2.3) supports naming some internal topics per KIP's.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping https://cwiki.apache.org/confluence/display/KAFKA/KIP+230%3A+Name+Windowing+Joins

It seems that it is possible (in 2.3) to name some internal topics now but not windowed join topics. We actually discovered that in trunk it is implemented but not released yet. https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java#L1109

Being able to name joins is important for long lived JOIN windows so that changes in the topology dont change the internal topic name and ignore the join history upon topology restart.

In the meant time we are gettin by a custom build 2.2 of kafka with support for Named Joins. https://github.com/FundingCircle/kafka/pull/5/files#diff-5142e1d4a6410459d6bf6df98828e5afR920-R921

And a patch to join impl functions:

(defn join-windowed
  "Combines the values of two streams that share the same key using a windowed
  inner join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.join (kstream* this-kstream)
          (kstream* other-kstream)
          (value-joiner value-joiner-fn)
          windows
          (Joined/with key-serde this-value-serde other-value-serde join-name))))

(defn left-join-windowed
  "Combines the values of two streams that share the same key using a windowed
  left join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.leftJoin (kstream* this-kstream)
              (kstream* other-kstream)
              (value-joiner value-joiner-fn)
              windows
              (Joined/with key-serde this-value-serde other-value-serde join-name))))

Supporting naming of windowed joins seems quite critical as explained above but looking into supporting other internal topic custom naming support should also be looked at.

cddr commented 5 years ago

Good summary of the issue. Thanks @kidpollo

Can we make sure the functions proposed above maintain the existing behavior if called without a join-name?

kidpollo commented 5 years ago

Yes they do I believe @99-not-out tested this by building trunk

cddr commented 5 years ago

Yes they do I believe @99-not-out tested this by building trunk

Not sure I made myself clear. What I mean is that users of jackdaw should continue to be able to call e.g. the join-windowed function with only 6 args (as opposed to the 7 args required by the definition above).