Cerfoglg commented 8 years ago

Root Issue: https://github.com/benchflow/benchflow/issues/23

Note: This issue is subject to change after feedback, or addition of new functions to test

Overview

To make sure the metrics are computed correctly, we need to write some tests for them. The issue here is that a lot of the code uses Spark, and as such using the standard testing modules for Python can't be used for the functions that use Spark, because the code needs to be submitted to a Spark master for it to work, and the testing modules do not work in that case.

The metrics that are computed with numpy and scypy do not rely on Spark, and thus they can be tested using the default Python testing modules (see http://docs.python-guide.org/en/latest/writing/tests/), writing python scripts that can be launched like any regular python application.

For the functions that use Spark the tests have to be written as Spark scripts, meaning that they need to be submitted to a Spark master using spark-submit to launch them, and will need to use simple Python assert statements to check for return values to match what we expect.

In both cases, to use this in Travis CI we can run a container with Python/Spark installed, run the tests inside of the container, and have the container exit with an error if a test fails, or exit normally if all tests pass.

To test functionality with Cassandra, both pulling data and saving data to it, we will also need to run a container with Cassandra, load the database schema on it (or use the benchflow cassandra that already load the schema on startup), load some test data to analyse, run our analysers, and check that data is present and correct as expected.

Python tests

All metrics computed by the numpy and scypy libraries are computed via the common function computeMetrics, which takes a list as input and returns a dictionary with the metric names for keys, and their values for values. Several tests should be done with different inputs, of which we know the expected metric results:

Data list is empty
Data list has exactly 1 value
Data list has an odd number of values
Data list has an even number of values
Data list has very large numbers
Data list has very small floating point numbers

In general the numbers we are using in the tests should represent the ones we can expect to see when computing the actual metrics.

Spark tests

Of the metrics we compute, the mode is computed via Spark, so the computeMode function should be tested with a Spark script, and similarly to the python tests for the other metrics. The function computeMode takes a RDD and returns a tuple containing the list of the mode and the frequency of the mode. The tests to perform are similar to the other metrics for python tests:

Data RDD is empty
Data RDD has exactly 1 value
Data RDD has a single value for mode
Data RDD has a multiple values for mode
Data RDD has a very large mode
Data RDD has a small floating point mode

The experiments, besides the metrics computed the same way as the trials, like for number of process instances for example, also compute metrics using Spark, like the min and max for the various trial metrics, the weighted mean of all trials, and the best/worst/average trials. These are computed with the computeExperimentMetrics function, which takes a RDD and returns a dictionary with the metric name as keys and their values as values. To test this function, we would need the following tests:

Data RDD is empty
Data RDD has exactly one value (one trial)
Data RDD has exactly two values (two trials)
Data RDD has exactly three values (three trials)
Data RDD has 4 or more values (4 or more trials)
Data RDD has very large values
Data RDD has very small floating point values
Data RDD has the same values for all trials

The simpler trials scripts that don't use the above functions also have their own functions to compute their metrics.

For the database size trial analyser we have databaseSize, that takes a RDD and returns the total size of all the dbs. Similar tests should be run:

DataRDD is empty
DataRDD has exactly one value
DataRDD has more than one value
DataRDD has very large values

The io script uses maxIOValues, which takes a RDD and returns the list of queries for Cassandra (which contain the computed metrics). The tests:

DataRDD is empty
DataRDD has exactly one value
DataRDD has more than one value
DataRDD contains null values
DataRDD contains very large values

Throughput has a computeThroughput function, which takes a RDD and returns the throughput and the delta of time passed. The tests:

DataRDD is empty
DataRDD has exactly one value
DataRDD has more than one value
DataRDD contains null values

The cpu analysers for both trial and experiments also have separate functions for cores metrics, that work the same as the overall ones. The tests should be the same, however also keeping into account the possibilities of:

All cores active
Some cores non active (null values)
All cores non active (null values)

Besides computing metrics, one major function we have is the cutNInitialProcesses function, which takes a RDD and an integer representing a number of processes to ignore, and returns a list of dictionaries, representing the data rows, with the initial processes cut based on the number given. Testing this should require the following cases:

Data RDD is empty
Data RDD has exactly one value
Data RDD has 2 distinct processes, 2 of each, cutting 1 initial process
Data RDD has 2 distinct processes, 1 of each, cutting 1 initial process
Data RDD has 2 distinct processes, 1 of each, cutting 2 initial processes
Cassandra testing

To test the functiinality of the analysers, which includes taking data from Cassandra and storing the computed metrics onto it, an instance of Cassandra running our benchflow model is required. The database can be run inside a docker container, initialised with the benchflow schema, as well as some mock data inside the environment_data and process tables. The scripts should be run, and once that's done we can launch another script to read from Cassandra and check if the data is indeed present there. If so, the scripts exits without errors, and thus the test has passed. If either the analysers or the last script fail and return with an error, then the test has failed.

For testing Cassandra we have to make sure to check, upon running the analysers:

If data is present in the source table (process, environment_data, trial table), then the target table (trial table, experiment table) has the expected data in it

Note that testing for possible dropped queries or other issues with Cassandra can be difficult, as they can be hard to properly predict, and are inconsistent. Stress testing would be a way to spot these, but this is outside the CI testing discussed here.

VincenzoFerme commented 8 years ago

@Cerfoglg good proposal, go for it. Some notes:

Python tests: We should also check that the number of decimal position of the computed metrics is the one we define. Moreover, do you also plan to write functional testing both for trials and experiments metrics to check whether the value for each of the statistics/metrics computed by the computeMetrics and other functions are correct, right?
Cassandra test: Can you please elaborate more on what you plan for testing integration with Cassandra, what are the artefacts you can reuse and what can be the main issues to be tested (e.g., correct number stored, correct precision when selecting data back, and so on...)
Spark tests: What you define here are the general set of tests to be performed, then on each set we need specific unit testing for each of the different kind of metrics. Moreover, can you please enrich the discussion about cutNInitialProcesses with more details about the number of different process we have in the set of processes, and how many of then we want to cut out?

Cerfoglg commented 8 years ago

@VincenzoFerme Updated the issue with more information. I've written several of these down already, but they are still not complete. Also need to make them work in CI with Travis still

VincenzoFerme commented 8 years ago

We keep this issue open so that we'll use it as starting point when working on tests again.

VincenzoFerme commented 8 years ago

@Cerfoglg update the test with the latest changes we introduced.

benchflow / analysers

Define tests for the analysers #56

Overview

Python tests

Spark tests

Cassandra testing