Create set of evaluation methods for estimate performance

AlexGibberd commented 7 years ago

In addition to performing the grid-search. An end user will need to have tools to actually assess the performance at each point in the grid.

For this there are several options available.

Bayesian Information Criterion (BIC, or other in-sample complexity measure)

Such measures count the number of "active" parameters in a model in order to try and assess the degrees of freedom (dof). The idea is that models with more degrees of freedom have more flexibility to potentially overfit to data. Following occams razor we would prefer a model which is simple and works reasonably. For standard models the estimation of dof is well understood, however, for GFGL it is not really clear what to use in order to estimate this.

I will try and see what a reasonable measure to use is with the MATLAB code, if I can get a prototype measure working, I will list here

General solution statistics

In the absence of a reliable BIC or any ground-truth, perhaps the best we can do is simply let the users see what different solutions look like. This is the approach I took in the original paper (GFGL paper in papers folder under GraphTime) where I plot properties such as the sparsity of the model for a variety of lambda parameters, for examples see Figures 2 and 5 in that paper. The idea is that by looking at how properties like the number of non-zero parameters changes over time, one can get an idea over when the dependency structure might be significantly changing.

F-Score/ROC/precision/recall

These options can only be used if ground-truth structure is known, i.e. simulated example and supplied. In the dynamic setting, one may look at recording these statistics at each estimated time-point in the graph, as well as summarising this into totals for the model. For an example of this kind of measures see (page 20) https://projecteuclid.org/download/pdfview_1/euclid.ejs/1351865118

[ ] BIC
[ ] Solution Statistics
[x] Classification Metrics

aleximmer commented 7 years ago

Since we use generated data with known changepoints right now, I will implement the classification measures used on page 20. Afterwards, we can use these metrics to visualize over a grid of values along the lines of #1. Warm Start is then even simpler to do.

aleximmer commented 7 years ago

ticked Classification metrics, implemented on new branch.

GlooperLabs / GraphTime