How to run spark cep with kafka topic

Samsung / spark-cep

Spark CEP is an extension of Spark Streaming to support SQL-based query processing

Apache License 2.0

56 stars 21 forks source link

How to run spark cep with kafka topic #1

Open innovativestreet opened 8 years ago

innovativestreet commented 8 years ago

Hello,

I am trying to run spark-cep job. I have created the jar and trying to submit it using spark-submit and below query.

SELECT COUNT(DISTINCT t.duid) FROM stream_test OVER (WINDOW '300' SECONDS, SLIDE '5' SECONDS) AS t

It says that query could be made corresponding to a kafka topic (stream_test in your case). Kafka does not stores information about a column so how does it queries for distinct duid?

Please suggest how to run this.

Thanks Janesh Mishra

rbkim commented 8 years ago

Hello, Janesh

Thank you for your interest of Spark CEP. Spark CEP architecture is based on Spark SQL and stores the aggregated info to RDDs which is the in-memory data structure of Spark. to use a kafka topic as a hive table, you should execute a DDL query with kafka configurations.

Regards, Robert