ageron / handson-ml

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Apache License 2.0
25.12k stars 12.91k forks source link

Dropout at test time #653

Closed pouryajafarzadeh closed 2 years ago

pouryajafarzadeh commented 2 years ago

As I remember from your book, Hands-On Machine Learning with Scikit-Learn and TensorFlow, we used dropout technique at train time. Is it reasonable to use dropout at test time?

ageron commented 2 years ago

Yes, if you have the 2nd edition of the book, this is discussed in chapter 11 under the section "Monte Carlo (MC) Dropout". Basically the idea is to leave Dropout on at test time, and when you want to make a prediction, you run the model, say, 100 times, instead of just once. This gives you a whole distribution of predictions. You can just use the mean if you want, it will often be better than using the model without dropout at test time, but of course it's 100x slower. You can also use the distribution to compute error bars for your predictions.

Check out notebook 11 from the github repo for the 2nd edition of my book: https://github.com/ageron/handson-ml2 If you want, you can also check out the MC dropout paper, it's interesting, but pretty advanced. They established a profound connection between dropout networks (i.e., neural networks containing Dropout layers) and approximate Bayesian inference. Specifically, they showed that training a dropout network is mathematically equivalent to approximate Bayesian inference in a specific type of probabilistic model called a Deep Gaussian Process. Dropout was already very popular, in an empirical way, but this MC Dropout paper gave it a solid mathematical justification.

ageron commented 2 years ago

Here's one way to implement MC Dropout:

import tensorflow as tf
import numpy as np

model = ... # build model using a regular Dropout layer
model.compile(...)
model.fit(...)

# make 100 predictions, with Dropout still active thanks to `training=True`
y_probas = np.stack([model(X_test, training=True)
                     for sample in range(100)])
y_proba = y_probas.mean(axis=0)

Alternatively, you can create a small MCDropout class, and use it in your models:

class MCDropout(tf.keras.layers.Dropout):
    def call(self, inputs, training=None):
        return super().call(inputs, training=True)

model = ... # build model using the MCDropout layer
model.compile(...)
model.fit(...)

# make 100 predictions, but this time no need to set `training=True`
y_probas = np.stack([model(X_test) for sample in range(100)])
y_proba = y_probas.mean(axis=0)

Hope this helps!

pouryajafarzadeh commented 2 years ago

Thanks a lot