ag-csw / LDStreamHMMLearn

1 stars 0 forks source link

Display percentiles along trajectory #31

Closed greenTara closed 7 years ago

greenTara commented 7 years ago

It is important that we show the distribution of errors, as well as the average error, as is now being shown in the heatmap. Since the distribution is represented by a set of numbers rather than one number, it does not work so well as a heatmap. So let's select a single set of parameters (the mid value), and create a plot where we trace the deciles (min= 0%, 10%, 20%, ..., 90%), max = 100%) of the error on the vertical axis and the time value (index along the trajectory) as the horizontal value. Title: Distribution of Transition Matrix Error Along Trajectory (Bayes)

and another plot for Naive. Labels: Time, Error All the deciles can appear on the same plot, with different colors and point symbols. Connect the points with lines.

Use the same number of runs as is being used for the average error plot (which we still investigate to see how many are necessary), and sample over many trajectories and many models from the family.

Let's do this first for the stationary Markov model, since it can be completed more quickly and we get the format worked out.

Afterwards, we will create the same plot for a non-stationary model, averaged over many runs, but just using a single model sampled from the family.

alexlafleur commented 7 years ago

I wrote a new function that uses the mid values for parameters. Now, I have num_estimations- error and time values. I found the function np.percentiles: np.percentile(var, np.arange(0, 100, 10)) # computes deciles but is that what you want? We need that for both error and performance values? How should that plot look like in the end?

greenTara commented 7 years ago

Let's ignore performance for now. Deciles is indeed what we want.

The plot should have points for each decile value, for each time value. The points for the same decile at different times should be connected by lines.

greenTara commented 7 years ago

And yes - the number of different time values is num_estimations. Those indices go along the horizontal axis.

greenTara commented 7 years ago

Please add parameters to this function so that the parameters can be specified, but set the mid values as the defaults.

alexlafleur commented 7 years ago

decile

Is this what you expect to get as an output plot? The different colors show the corresponding decile values for a timepoint of error. ("k" in num_estimations).

I ran this evaluation with 64 number of runs.

alexlafleur commented 7 years ago

Naive

decile_naive

Bayes

decile_bayes

alexlafleur commented 7 years ago

I realized that the simulated data was created with different taumeta (max_taumeta) than was used inside the evaluation (mid_taumeta). After changing that to both mid_taumeta, I retrieved the following plots (numruns=8 for now): decile_bayes.pdf decile_naive.pdf

greenTara commented 7 years ago

The plots are good. The labels and scaling need some work.

Vertical: Transition Matrix Error Horizontal: Time

To have the horizontal axis actually be "time", we need to do a transformation...

I believe we have a method somewhere to do this transformation, because we need it for calculating mu (but that is only in the nonstationary case, so we might need to duplicate that function into the stationary class.)

greenTara commented 7 years ago

Let's make the lower value on the vertical axis to be zero, since that is the theoretical minimum value of the error.

greenTara commented 7 years ago

current_time = window + k*shift -1 # most recent time estimation_time = (formula- see isse #36) # time at center of mass of weight distribution

So estimation_time is what should be plotted on the horizontal axis.

greenTara commented 7 years ago

Let's also coordinate these plots with the (improved) prediction in issue #15

I have an improved formula for the expectation of the error in the MM case:

err_bayes[k]/err_naive = math.sqrt( (1+math.pow(r,2*k+1))/((1+r) ))

Also

w = self.window_size err_naive = c/math.sqrt(w * num_trajs) where c is a function of eta, scale_window, taumeta, etc.

alexlafleur commented 7 years ago

Deciles_Bayes_MM.pdf

alexlafleur commented 7 years ago

Deciles_Bayes_MM.pdf

I still have to change the legend, but is the overall look of the plot expected?

greenTara commented 7 years ago

Just a thought on these plots - we should check the shape of the error matrices - they may need to be flattened to do the statistics correctly.

alexlafleur commented 7 years ago

It's a ndarray of shape (53,)

alexlafleur commented 7 years ago

I realized that we used the complete trajectory (num_trajs=1) in the above plots (where the error is decreasing over time). I reran the script with this change and it looks like this: Deciles_Bayes_MM.pdf

On the other hand, when I run it with the reshaped trajectories (num_trajs=4), it looks like this: Deciles_Bayes_MM.pdf

Maybe thats the issue here? Another thing might be that we set the y_min to 0, which compresses the data so that differences are not so obvious anymore

alexlafleur commented 7 years ago

Plot with shape AVG_ERRORS_ND (256, 53): Deciles_Bayes_MM.pdf

alexlafleur commented 7 years ago

Deciles_Bayes_MM.pdf

taumeta: 4 eta: 16 scale_window : 16 shift: 64 window_size: 1024 num_estimations: 18 len_trajectory: 2177 num_trajectories: 4 num_trajectorieslen_trajectory: 8708 NAIVE window_size num_estimations 19456 BAYES window_size + num_estimations*shift 2176

numruns: 8 num_trajectories = 128 (num_trajectories simulated) numsims = 32 num_trajs = 4 (number of trajectories per error calculation)

greenTara commented 7 years ago

Let's also coordinate these plots with the (improved) prediction in issue #15

I have an improved formula for the expectation of the error in the MM case:

err_bayes[k]/err_naive = math.sqrt( (1+math.pow(r,2*k+1))/((1+r) ))

and recall that err_naive = err_bayes[0].

So lets plot:

pred[k] = err_bayes[0] math.sqrt( (1+math.pow(r,2k+1))/((1+r) ))

alexlafleur commented 7 years ago

Deciles_Bayes_MM.pdf

black triangles: predicted error green circles: mean error

I still have to add them to the legend.

greenTara commented 7 years ago

Plots archived in #43