more posterior prediction improvements

drbenvincent commented 7 years ago

There was some unfinished business from #185.

The "percent predicted" subplot

[x] The red line representing the control model needs to be updated to reflect the fact that the control model will predict 100% delayed responses if P(chose delayed)>0.5 and 100% immediate responses if P(chose delayed)<0.5.

Fix posterior predictive check warnings

Need to update the flagging of problematic experiments in ResultsExporter.any_percent_predicted_warnings().

[x] The calculation of this stuff should really be the responsibility of PosteriorPrediction.
[x] The warnings needs to be based on the updated percent predicted values (i.e. not based on 50%)

Add discount function subplot to the posterior prediction plots

[x] It makes sense to have the actual data + model predictions in this plot to help evaluate things.

Replace "goodness of fit" score with "log loss"

Rather than use my ad hoc goodness of fit score, calculate and report the log loss score. This is a cross entropy measure appropriate for binary outcome variables. See more about log loss here. Note that this is just a goodness of fit metric, and is not complexity penalised, so it not appropriate for model comparison between models with varying numbers of parameters.

Lower values of log loss correspond to better fits, ie more correct classifications of participant responses.

drbenvincent commented 7 years ago

Updated position of red line in lower right plot to reflect new control model in #185. Here the observed frequency of choosing delayed option was ~40%. In terms of predicting responses, the control model would then predict all responses to be immediate, as this is under the 50% decision threshold. This can then explain 100-40% =60% of all responses, and the red line now reflects this.

We also have the response data and posterior predictive checks (discount functions) plotted so we can do a good job of model checking just by looking at this figure alone.

jnrw-gains-2016nov24-09 06-posteriorpredictive-separateebertprelec

drbenvincent commented 6 years ago

So we now calculate the (distribution of) Log Loss goodness of fit metric. The point estimates are also exported in the .csv file of parameter estimates.

screen shot 2018-03-07 at 10 22 32

drbenvincent / delay-discounting-analysis