Open innovativestreet opened 8 years ago
Hello, Janesh
Thank you for your interest of Spark CEP. Spark CEP architecture is based on Spark SQL and stores the aggregated info to RDDs which is the in-memory data structure of Spark. to use a kafka topic as a hive table, you should execute a DDL query with kafka configurations.
Regards, Robert
Hello,
I am trying to run spark-cep job. I have created the jar and trying to submit it using spark-submit and below query.
SELECT COUNT(DISTINCT t.duid) FROM stream_test OVER (WINDOW '300' SECONDS, SLIDE '5' SECONDS) AS t
It says that query could be made corresponding to a kafka topic (stream_test in your case). Kafka does not stores information about a column so how does it queries for distinct duid?
Please suggest how to run this.
Thanks Janesh Mishra