benchflow / analysers

Spark scripts utilised to analyse data and compute performance metrics
Other
0 stars 1 forks source link

Experiment Level Metrics and Statistics #22

Open VincenzoFerme opened 8 years ago

VincenzoFerme commented 8 years ago

In the following I describe the experiment level metrics and statistics we should implement as Spark scripts. They are open for discussion and extension in this thread.

Some background on the type of data we have

We perform different trials for the same experiment, by making sure the environment in which we execute the experiment is stable across the trials and we ensure that the initial conditions are always the same. This means we have a pretty stable behaviours among the different runs, hence pretty similar performance measures.

Metrics and Statistics

ToDos

  1. Update the cassandra schema to accomodate the metrics and statistics defined above
  2. Implement the metrics and statistics defined above. Use Spark wherever possible, or rely on a solid statistics library for the rest (e.g., http://pandas.pydata.org, http://www.scipy.org, http://www.numpy.org). It is important to refactor the current code before proceeding with the implementation.
Cerfoglg commented 8 years ago

@VincenzoFerme What's described here is all implemented, but https://github.com/benchflow/analysers/issues/83

VincenzoFerme commented 8 years ago

@Cerfoglg document here the final set of metrics. As for example the integral and the efficiency are missing, as well as for example the following ones:

Start from your thesis.

@ivanchikj how did we define and why the aggregate metrics at experiment level for the efficiency?

ivanchikj commented 8 years ago

For the CPU efficiency on experiment level we have defined the aggregate metrics as (T1, T2 and T3 are the trials):

We apply the weighted average for CPU and RAM.