GoogleCloudPlatform / dataproc-pubsub-spark-streaming

Apache License 2.0
31 stars 27 forks source link

In this tutorial you learn how to deploy an Apache Spark streaming application on Cloud Dataproc and process messages from Cloud Pub/Sub in near real-time. The system you build in this scenario generates thousands of random tweets, identifies trending hashtags over a sliding window, saves results in Cloud Datastore, and displays the results on a web page.

Please refer to the related article for all the steps to follow in this tutorial:

https://cloud.google.com/solutions/using-apache-spark-dstreams-with-dataproc-and-pubsub

Contents of this repository:

Running the tests

To run the tests:

cd spark
mvn test