Reorganize Evaluation Scripts

alexlafleur commented 7 years ago

To reuse evaluation scripts for various purposes, we need to do some changes:

we want to use Variable_holder as an object (instead of a static class) where we set all parameters used within the evaluation scripts.
This object is passed into the script
we want to pass in the error function that should be used within the evaluation
- Signature: err: ndarray x SpectralMM --> number
- transition matrice and model; output is the error value

alexlafleur commented 7 years ago

Okay, I am mostly done with this except for the num_trajectories evaluation. There, I get an exception again because num_estimations becomes <= 0. Nevertheless, I adapt the evaluation scripts now, so that the new, generalized script is called. Then, we can remove obsolete and duplicate code scripts. We should take a look at the num_trajectories case later when we need it.

greenTara commented 7 years ago

Yes let's not spend time on that problem now.

On Tue, Nov 29, 2016 at 8:36 AM, Alexandra La Fleur < notifications@github.com> wrote:

Okay, I am mostly done with this except for the num_trajectories evaluation. There, I get an exception again because num_estimations becomes <= 0. Nevertheless, I adapt the evaluation scripts now, so that the new, generalized script is called. Then, we can remove obsolete and duplicate code scripts. We should take a look at the num_trajectories case later when we need it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ag-csw/LDStreamHMMLearn/issues/35#issuecomment-263571374, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMMVfx1kTHbw83DbupcnpdL7rILo4ASks5rDCpAgaJpZM4K8nPf .

alexlafleur commented 7 years ago

I adapted the script for the bayes error calculation which is based on eta and scalewindow. The first plot shows the old scripts error results (8 runs): dependence_bayes_error_mm_delta 0

The second shows the new adapted scripts error results (8 runs): dependence_bayes_error_mm_delta 0_new

Do you think the results can be assumed the same? Since we're simulating and sampling in each run it is possible that we get very different results. Do you think the difference in colour is because the number of runs (=8) is too small or do you think there is something going wrong?

greenTara commented 7 years ago

We do want to have the possibility of setting the seed of the random number generator, and that would solve questions like this. But probably it would take another change of code to do that. How difficult do you think it would be to set that up?

greenTara commented 7 years ago

I do think it is most likely caused by the small number of runs. Another check would be to increase the number of runs and see if things get closer.

alexlafleur commented 7 years ago

I think it would not be a difficult change. Anyways, I would prefer to complete that issue so far before making another change. Many code fragments are dependent on that. Once we have cleaner code it is also easier to adapt that change. Maybe open a new issue for that, so we can keep that in mind?

alexlafleur commented 7 years ago

For numruns = 16 it came a little closer, I think.

dependence_bayes_error_mm_delta 0_16

dependence_bayes_error_mm_delta 0_16_new

greenTara commented 7 years ago

yes, let's open a new issue for the random number seed setting

greenTara commented 7 years ago

Looks closer for 16 runs. That is sufficient confirmation for me.

greenTara commented 7 years ago

Are you also able to evaluate for the other error functions (e.g. in timescales)?

alexlafleur commented 7 years ago

I did not work on that yet, because I wanted to make sure that the existing evaluations we have are still working with the adapted function. But I will work on that afterwards!

alexlafleur commented 7 years ago

QMM delta=0.5 numruns=1

The first plot is the (adapted) code we had to far are. dependence_bayes_error_qmm_delta 0 5

The second is the one with the generalized evaluation script. dependence_bayes_error_qmm_delta_new 0 5

Question: There is this strange behaviour in the right upper corner in both diagrams. Do you have an idea where this is coming from?

I already checked that the parameters are all the same for each of the heatmap cells in both scripts.
I realized that the model transition matrix was NOT the same for the same k in both scripts.. Is that expected? (We are calling scaled_model.eval(k).trans)
For example: Old Script: [[ 0.90801428 0.01608841 0.03673236 0.03916495] [ 0.00679835 0.92116698 0.03284496 0.03918971] [ 0.00929744 0.01594065 0.93848448 0.03627744] [ 0.00470431 0.01707525 0.03502241 0.94319803]]

New Script: [[ 0.9246952 0.01960245 0.02749806 0.02820429] [ 0.02489531 0.93263314 0.01485096 0.02762059] [ 0.01867239 0.02027082 0.93238009 0.0286767 ] [ 0.01459067 0.0190751 0.03075666 0.93557757]]

greenTara commented 7 years ago

There were some problems with evaluating the transmission matrix at the right time value (#36). Has that been fixed? If not, then we cannot expect these scripts to give reasonable output.

greenTara commented 7 years ago

Also, is the seed for the random number generator being set? Since there is only run, we would expect different transition matrices in the two cases unless the seed is set.

greenTara commented 7 years ago

Another point to check - we should be using one simulation for the entire heatmap (when numruns = 1) - has that change been fully incorporated into the nonstationary case?

greenTara commented 7 years ago

The behavior in the last column is not necessarily wrong - with non-zero delta, we will start to see the effect of the non-stationary behavior, which will in some cases behave in the opposite manner than the stationary case. For example, in the stationary case, making window_size and shift bigger will always reduce the error, because more samples are included in the calculation. But in the nonstationary case, these samples are being taken from different models, so increasing window_size and shift can make the error worse when there is a larger degree of drift within the window.

greenTara commented 7 years ago

What surprises me is that the third column behaves differently than the other two columns. I had expected that the parameters had been coordinated in a way that this would not happen. For this reason, I suspect that the issue #36 is coming into play.

alexlafleur commented 7 years ago

Oh, no I did not take a look at #36 yet, so I'll do that now..

alexlafleur commented 7 years ago

Completed and tested transformations are:

stationary Bayes Error Calculation for taumeta vs (eta, scalewindow)
non-stationary Bayes Error Calculation for taumeta vs (eta, scalewindow)
comparison of naive and bayes error calculation for taumeta vs (eta, scalewindow) for MM and QMM with delta=0

Question: Anything else that needs to be transformed NOW (that is urgent?)

Postponed transformations are:

statconc evaluation
timescaledisp evaluation

greenTara commented 7 years ago

As far as I can tell, this takes care of the heatmap requirements, other than #18 , which has its own issue. There is a remaining issue about the conversion from arrays to lists, see #24 . Once that is resolve, we should generate these heatmaps for large num_trajectories (say 256), and in the MM case, large num_runs (also 256) - for QMM, just num_runs = 1, and upload to sharelatex. Then we can close this issue.

The other kind of plot is the one where time is on the horizontal axis (e.g. the deciles #31 ). We need to look at those scripts to see what needs to be done to finalize those, but this issue doesn't need to remain open for that.

greenTara commented 7 years ago

Here's a list of the taumeta heatmaps that we need:

MM only:

Taumeta heatmaps. Comparison of Naïve and Bayes error. (num_trajs =1, num_runs=64, numsims = 64)
Taumeta heatmaps. Comparison of Naïve and Bayes performance (num_trajs=256, num_runs = 1, numsims = 1).

MM and QMM (with setting of random seed num_trajs=1, num_runs = 1, numsims =1)

taumeta heatmaps: Comparison of error for MM and QMM with delta = 0. For testing only, not for paper.

QMM only (with setting of random seed for reproducibility, num_trajs=1, num_runs=1, numsims = 64 or however large is reasonable)

taumeta heatmaps: Comparison of error for naïve and bayes, delta = 1

greenTara commented 7 years ago

The generation of individual plots is being carried on in separate issues, one for each plot.

ag-csw / LDStreamHMMLearn

Reorganize Evaluation Scripts #35

QMM delta=0.5 numruns=1