Closed paulsonak closed 3 months ago
Following your recs @mcloughlin2 I updated the code so that all pred_vs_actual plots call the _from_df()
function to graph each axis. However, the stds are always None for live pipes, according to perf_data.get_pred_values()
. I think it is how the data is passed into perf_data.accumulate_preds(), but didn't trace this any further. So, when pp.plot_pred_vs_actual(regr_pipe, error_bars=True)
is called, it just plots with no error bars because stds is None.
uncertainty
to error_bars
._from_file()
and from pipe. They are necessarily different in from_df()
.threshold
is implemented in all 3 versions of the function.Oh right, I forgot that uncertainties aren't computed during training, so they don't get stored in the PerfData structures. So to get error bars in plot_pred_vs_actual, we'd have to run predictions from the live pipe. That's certainly doable, and would be a little faster than simply calling plot_pred_vs_actual_from_file on the just-saved model, since the model is already loaded.
It's weird, though...we've been thinking all along that 'uncertainty' is a parameter of the model training process, when really it only comes into play at prediction time. Does it change anything about the training process (other than forcing you to include dropouts in every layer)? I guess I'll have to look at the DeepChem code to find out...
I updated the plot_pred_vs_actual to call plot_pred_vs_actual_from_file to get the predictions. I first tried to do pipe.predict_full_dataset directly but it modifies the pipeline object in place and got pretty confusing pretty fast.
I think of uncertainty as influencing model selection or HPO but not directly influencing the training process.
Will merge this to 1.6.2. If there are more changes needed, please continue the work in the next release.
…ead of forced uncertainty plotting