Understand why active learning gets worse with more data

WardLT commented 3 years ago

We find the performance of our active learning agent gets worse as we retrain the models. The chart below shows how we find fewer high-performing molecules with strategies where we update the model (update and train) than we do with a strategy where we never update the MPNNs (no-retrain)

A list of hypotheses:

[x] Models are getting worse with training. No, the accuracy of models on a hold-out set improves with retraining
[x] Too much time is getting spent on retraining. No, the decrease in number of molecules found is 57% and only we only loose 3% of simulation time to ML tasks.
[x] Calibration of uncertainties changes (worsens?) with retraining. Yes. Not yet sure if fixing that would fix our issues.

Potential solutions:

[x] Retrain the model from scratch. Did not work (see above picture)
[x] Update the models using bootstrapped datasets (we used the full training set previously).
[ ] Use a larger ensemble of machine learning models (https://arxiv.org/abs/1906.02530 notes ~5 should be sufficient?)
[ ] On-line calibration (#14)
[ ] Bayesian neural networks instead of ensembles

pythonpanda2 commented 3 years ago

@WardLT Something to consider going forward. https://github.com/uncertainty-toolbox/uncertainty-toolbox

We could simply wrap our predictions, std, labels around their API and let the tool box figure out what would be best calibration method.

WardLT commented 3 years ago

Part of the puzzle. Without bootstrap sampling when updating models and only 4 replicas in the ensemble, our uncertainties are much worse after retraining.

WardLT commented 3 years ago

The de-calibration is lessened if we use bootstrap sampling when creating the training set before updating the model.

WardLT commented 3 years ago

We see similar, slight degradation with the 16 bootstraped models

WardLT commented 3 years ago

Training with more epochs (here, 512) can make the problem worse

WardLT commented 3 years ago

Resetting the weights on the optimizer does seem to help. This is back to using 64 epochs to retrain the model.

Using random initial weights seems to work just as well in terms of the uncertainties.

WardLT commented 3 years ago

It was a bug 😆 See: 73f0579

exalearn / electrolyte-design

Understand why active learning gets worse with more data #15