felipegutierrez / explore-flink

This project uses Apache Flink as a stream engine that consumes data from the File system or Kafka brokers and exposes metrics using Prometheus and Grafana, everything deployed on Kubernetes (minikube).
44 stars 22 forks source link

make the stream application last for a given time #13

Closed felipegutierrez closed 5 years ago

felipegutierrez commented 5 years ago

We need to set a parameter when launch the stream application to be executed in s specific given time. Let's say we want to execute it for 20 minutes. We just need to pass this argument when launch the stream application.

abelpc commented 5 years ago

Not exactly. We need a given workload that makes the stream application to complete after 20 minutes, without any parameter. This can be done by creating an input dataset that when streamed over to the application will take 20 minutes to complete execution. Note that if you make application code optimizations, this time may be kept constant, reduced, or increased.

felipegutierrez commented 5 years ago

I am not sure about this. Stream applications have the nature to run infinitely. The workload can be finite, but the stream application will keep listening to new data on the source. So, I guess we need a finite workload maybe? and we raise a flag when the workload finishes. Does it make sense?

abelpc commented 5 years ago

My understanding is that stream applications do not necessarily run "infinitely" (as in "forever"), but rather that they process data dynamically. This can be observed if one thinks about a sensor (source) which is not generating any data for processing, perhaps because the sensor is off, or because there is nothing to be detected, and in which case the stream application would be idling, i.e., not effectively running because there is no data to process.

So yes, we need a finite dataset workload for finite processing because our experiments will be finite. The dataset should have an "END_OF_DATASET" tuple in the end of it, which will indicate to the application the end of the streaming data, making it to idle.

felipegutierrez commented 5 years ago

I provided what is described on this issue and also regarding the discussion to understand better the requirement. Use the script bash conf/launchApp.sh to see all the instructions to use the producers and consumers. It also shows examples of how to launch the applications.