confluentinc / kafka-tutorials

Tutorials and Recipes for Apache Kafka
https://developer.confluent.io/tutorials
Apache License 2.0
14 stars 89 forks source link

Aggregations: element frequency (KStreams) #19

Open tlberglund opened 5 years ago

jwfbean commented 5 years ago

How distinct should this example be from wordcount? Or should it basically be wordcount in a recipe form?

tlberglund commented 5 years ago

I mean, it's word count, which already exists in the streams examples repo, which gives us a good head start. Thanks to Hadoop, word count is a trite example, but making it fun depends upon finding fun text. :)

jwfbean commented 5 years ago

Kafka wrote a text about messages. http://johnstoniatexts.x10host.com/kafka/imperialmessagehtml.html

jwfbean commented 5 years ago

But I mean to do word count you also need to flat map the text and split it, so that knocks out #25 too doesn't it?

tlberglund commented 5 years ago

Yes, but I think we want to avoid combining recipes. I just wrote a couple of them that implement #23, but #23 should still be its own thing, so when somebody is trying to figure that out, they can see it and it alone, and not have to tease it out from other stuff they are trying to figure out. 😁

I suspect we could figure out some other kind of data other than words which we could produce to the topic where element frequency is an interesting question. That will mean producing a couple of dozen records of test data (at least), but gets you out of flatmapping.

rspurgeon commented 4 years ago

@jwfbean, @colinhicks was this recipe completed? Is this usecase covered by this KSQL recipe? https://kafka-tutorials.confluent.io/create-stateful-aggregation-count/ksql.html and if so could it be copied over to the kstreams flavor?

colinhicks commented 4 years ago

@rspurgeon, there is a KStreams flavor of the aggregation count tutorial you linked to, merged here: https://github.com/confluentinc/kafka-tutorials/pull/168

My sense is this issue is different, in that it points toward the use of flatMapValues in KStreams.