A Spark application is required as workloads while performing benchmark.
This is the blocker of benchmark suite.
Since this Spark Experiment application will be used for benchmark, it at least needs the following features:
A source input generator, it should keeping generating random texts until some constraints are reached, e.g. experiment time or max number of text lines.
Allow configuring experiment parameters, e.g. text line size, experiment time, contain malicious codes or not etc.
Must contain two kinds of workloads:
Normal Spark workloads. (e.g. word count)
Malicious Spark workloads which contains both the normal workloads and the security attack codes.
Must output the start and the end timestamps for each text line to the sink (either a file sink or database sink), so that we can measure the latency and throughput after the experiment is completed.
Don't think we need this type of experiment application anymore since we are evaluation the e2e performance of the YARN cluster. We can just use a simple example Spark application instead.
A Spark application is required as workloads while performing benchmark.
This is the blocker of benchmark suite.
Since this Spark Experiment application will be used for benchmark, it at least needs the following features: