felipegutierrez / explore-flink

This project uses Apache Flink as a stream engine that consumes data from the File system or Kafka brokers and exposes metrics using Prometheus and Grafana, everything deployed on Kubernetes (minikube).
43 stars 21 forks source link

Create a bash script to deploy stream applications on the cluster #8

Closed felipegutierrez closed 5 years ago

felipegutierrez commented 5 years ago

Create a generic script to deploy stream applications and use different arguments as parameters. The stream application must receive the following parameters:

felipegutierrez commented 5 years ago

I have prepared a script which one can copy and paste the commands on the terminal. It shows a real command to use and the description of each parameter to use on the deployment of this application. One can execute the script bash explore-flink/conf/launchApp.sh to check any update version of it.

Launching the Flink Standalone cluster:
   /home/flink/flink-1.9.0/bin/start-cluster.sh
Launching the Flink + Mesos cluster:
   /home/flink/flink-1.9.0/bin/mesos-appmaster.sh

Launching a Flink Stream application >>

   /home/flink/flink-1.9.0/bin/flink run -c org.sense.flink.App /home/flink/explore-flink/target/explore-flink.jar -app 30 -source 130.239.48.136 -sink 130.239.48.136 -offlineData [true|false] -frequencyPull [seconds] -frequencyWindow [seconds] -parallelism [int] -disableOperatorChaining [true|false] -output [file|mqtt]
   /home/flink/flink-1.9.0/bin/flink run -c org.sense.flink.App /home/flink/explore-flink/target/explore-flink.jar -app 30 -source 130.239.48.136 -sink 130.239.48.136 -offlineData true -frequencyPull 1 -frequencyWindow 30 -parallelism 4 -disableOperatorChaining true -output file

description of each parameter:
   -app : which application to deploy. If you don't pass any parameter the jar file will output all applications available.
   -source,-sink: IP of the source and sink nodes. It means that you can see the output of the application on the machines that hold these IPs.
   -offlineData: TRUE means to use offline data. FALSE means to download data from Valencia open data portal (http://gobiernoabierto.valencia.es/en/data/).
   -frequencyPull: frequency of pooling data from the data source in seconds. Online data is downloaded from Valencia open data porta and stored on the local machine. So it is still possible to increate the polling frequency of this data. Usually if you want a very high frequency of polling you could set this parameter for 1 second and increase the polling frequency of synthetic data using the 'SideInput operator' which is listenning to a Mqtt broker:
      mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "AIR_POLLUTION 500"
      mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "TRAFFIC_JAM 1000"
      mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "NOISE 600"
   -frequencyWindow: frequency to compute the window in seconds.
   -parallelism: degree of parallelism to deploy the application on the cluster. It means the redundante operators will be created in order to guarantee fault tolerance.
   -disableOperatorChaining: FALSE is the default. TRUE disables fusion optimization for all operators which means that operators will be allocated on different threads (https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#configuring-taskmanager-processing-slots).
   -output: 'file' means that the output will be generated in the Flink TaskManagers log files. 'mqtt' means that you have to subscribe to a mqtt channel according to the message showed when the application is deployed.

Consuming a Flink Stream application output >>
   mosquitto_sub -h 130.239.48.136 -t topic-valencia-data-cpu-intensive

Listing all Flink Stream applications
   /home/flink/flink-1.9.0/bin/flink list

Canceling a Flink Stream application
   /home/flink/flink-1.9.0/bin/flink cancel APP_ID

Sending parameters to change the frequency of synthetic item generators
   mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "AIR_POLLUTION 500"
   mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "TRAFFIC_JAM 1000"
   mosquitto_pub -h 127.0.0.1 -t topic-synthetic-frequency-pull -m "NOISE 600"