Add support for spark-cassandra-stress

nastra commented 8 years ago

This PR will add support for running spark-cassandra-stress if dse is selected in the product dropdown.

spark-stress

For spark-cassandra-stress, the user needs to specify one particular node. On this node we will then download / build / execute spark-cassandra-stress.

nastra commented 8 years ago

@EnigmaCurry / @aboudreault can you guys review please? Also is benchmark.py the right place where the spark download / build / execute code should be living?

rocco408 commented 8 years ago

1) It would be helpful to add a hook for selecting which branch of spark-cassandra-stress to build/run. This would make it easier to update and add patches to the tool during testing. Default could be master.

2) I would also turn on spark-specific metric collection which will grab all the codehale metrics exposed by Spark. Setting this up can be done by copying dse/resources/spark/conf/metrics.properties.template to dse/resources/spark/conf/metrics.properties and enabling the appropriate sink. In the past I've used the CsvSinks, but we may be able to leverage the GraphiteSink, some exploration may be needed to get that working. In the case of spark streaming, it would be nice to be able to grab snapshots of the metrics reported in the Spark Streaming UI tab, but we may be able to rebuild these views from the raw data collected through the sinks.

Link to spark monitoring: http://spark.apache.org/docs/latest/monitoring.html

3) I would also recommend enabling event logging incase an error occurs we can access the Spark UI info of previously failed jobs. To do this we add the following to spark-defaults.conf spark.eventLog.enabled true spark.eventLog.dir /path/to/existingEventLogDir

nastra commented 8 years ago

@rocco408 thanks for the feedback. Probably worth implementing all those suggestions in separate PRs

aboudreault commented 8 years ago

LGTM, only a minor comment and need a rebase.

nastra commented 8 years ago

@aboudreault rebased and resolved the merge conflict.

@rocco408 I will keep your suggestions on my plate and will come back to you once I find some time to implement them

datastax / cstar_perf

Add support for spark-cassandra-stress #159