benchflow / analysers

Spark scripts utilised to analyse data and compute performance metrics
Other
0 stars 1 forks source link

Define tests for the analysers #56

Open Cerfoglg opened 8 years ago

Cerfoglg commented 8 years ago

Root Issue: https://github.com/benchflow/benchflow/issues/23

Note: This issue is subject to change after feedback, or addition of new functions to test

Overview

To make sure the metrics are computed correctly, we need to write some tests for them. The issue here is that a lot of the code uses Spark, and as such using the standard testing modules for Python can't be used for the functions that use Spark, because the code needs to be submitted to a Spark master for it to work, and the testing modules do not work in that case.

The metrics that are computed with numpy and scypy do not rely on Spark, and thus they can be tested using the default Python testing modules (see http://docs.python-guide.org/en/latest/writing/tests/), writing python scripts that can be launched like any regular python application.

For the functions that use Spark the tests have to be written as Spark scripts, meaning that they need to be submitted to a Spark master using spark-submit to launch them, and will need to use simple Python assert statements to check for return values to match what we expect.

In both cases, to use this in Travis CI we can run a container with Python/Spark installed, run the tests inside of the container, and have the container exit with an error if a test fails, or exit normally if all tests pass.

To test functionality with Cassandra, both pulling data and saving data to it, we will also need to run a container with Cassandra, load the database schema on it (or use the benchflow cassandra that already load the schema on startup), load some test data to analyse, run our analysers, and check that data is present and correct as expected.

Python tests

All metrics computed by the numpy and scypy libraries are computed via the common function computeMetrics, which takes a list as input and returns a dictionary with the metric names for keys, and their values for values. Several tests should be done with different inputs, of which we know the expected metric results:

In general the numbers we are using in the tests should represent the ones we can expect to see when computing the actual metrics.

Spark tests

Of the metrics we compute, the mode is computed via Spark, so the computeMode function should be tested with a Spark script, and similarly to the python tests for the other metrics. The function computeMode takes a RDD and returns a tuple containing the list of the mode and the frequency of the mode. The tests to perform are similar to the other metrics for python tests:

The experiments, besides the metrics computed the same way as the trials, like for number of process instances for example, also compute metrics using Spark, like the min and max for the various trial metrics, the weighted mean of all trials, and the best/worst/average trials. These are computed with the computeExperimentMetrics function, which takes a RDD and returns a dictionary with the metric name as keys and their values as values. To test this function, we would need the following tests:

The simpler trials scripts that don't use the above functions also have their own functions to compute their metrics.

For the database size trial analyser we have databaseSize, that takes a RDD and returns the total size of all the dbs. Similar tests should be run:

The io script uses maxIOValues, which takes a RDD and returns the list of queries for Cassandra (which contain the computed metrics). The tests:

Throughput has a computeThroughput function, which takes a RDD and returns the throughput and the delta of time passed. The tests:

The cpu analysers for both trial and experiments also have separate functions for cores metrics, that work the same as the overall ones. The tests should be the same, however also keeping into account the possibilities of:

Besides computing metrics, one major function we have is the cutNInitialProcesses function, which takes a RDD and an integer representing a number of processes to ignore, and returns a list of dictionaries, representing the data rows, with the initial processes cut based on the number given. Testing this should require the following cases:

To test the functiinality of the analysers, which includes taking data from Cassandra and storing the computed metrics onto it, an instance of Cassandra running our benchflow model is required. The database can be run inside a docker container, initialised with the benchflow schema, as well as some mock data inside the environment_data and process tables. The scripts should be run, and once that's done we can launch another script to read from Cassandra and check if the data is indeed present there. If so, the scripts exits without errors, and thus the test has passed. If either the analysers or the last script fail and return with an error, then the test has failed.

For testing Cassandra we have to make sure to check, upon running the analysers:

Note that testing for possible dropped queries or other issues with Cassandra can be difficult, as they can be hard to properly predict, and are inconsistent. Stress testing would be a way to spot these, but this is outside the CI testing discussed here.

VincenzoFerme commented 8 years ago

@Cerfoglg good proposal, go for it. Some notes:

Cerfoglg commented 8 years ago

@VincenzoFerme Updated the issue with more information. I've written several of these down already, but they are still not complete. Also need to make them work in CI with Travis still

VincenzoFerme commented 8 years ago

We keep this issue open so that we'll use it as starting point when working on tests again.

VincenzoFerme commented 8 years ago

@Cerfoglg update the test with the latest changes we introduced.