ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.23k stars 12.59k forks source link

[QUESTION] Chapter 17 Question 9 - my results make sense? #556

Closed KirkDCO closed 2 years ago

KirkDCO commented 2 years ago

I'm working through the exercises in Chapter 17, and I think I've done this correctly, but I'm not clear on what I "should" see. My code can be found on Github here (https://github.com/KirkDCO/HandsOnML_Exercises/blob/main/Ch17_Q09.ipynb) and runs well on Collaboratory with a GPU runtime.

For the exercise, I'm using the Fashion-MNIST dataset - seemed simple enough but not too simple. I've set up a denoising autoencoder and I can see that the reconstruction of a noisy image is quite good, losing some detail as expected. (Relevant code and results below.)

I then use the encoder to make encoded versions of the first 500 images, and use those encoded version as inputs to a dense neural network. On the test set, I get 74.7% accuracy. Not bad for just 500 training images.

I then create a simple CNN using the same structure as the encoder with a dense neural network on top. This is just a non-autoencoder version of what I've already done. On the test set here I get 79.6% accuracy. Reasonably better.

This surprised me as I expected the denoising AE to perform better when using the pretrained embedding (or encoding layer) as input to the dense network on top. In thinking about it, perhaps it does better because this is a relatively simple dataset - greyscale images, not a huge amount of complexity, etc. So the simpler model (non-AE) does well where the increased complexity just can't add much given the simplicity of the problem.

Am I interpreting this correctly or have I done something terribly wrong? Any input is greatly appreciated.

Code for my denoising autoencoder

encoder = keras.models.Sequential([
            keras.layers.GaussianNoise(0.1, input_shape = [28, 28, 1]),
            keras.layers.Conv2D(32, kernel_size = 3, padding = 'same', activation = 'relu'),
            keras.layers.MaxPool2D(),
            keras.layers.Flatten(),
            keras.layers.Dense(128)
])

encoder.summary()

decoder = keras.models.Sequential([
            keras.layers.Dense(14 * 14 * 32, activation = "relu", input_shape = [128]),
            keras.layers.Reshape([14, 14, 32]),
            keras.layers.Conv2DTranspose(filters = 1, kernel_size = 3, strides = 2,
                                         padding = "same", activation = "sigmoid")
])

decoder.summary()

dae = keras.models.Sequential([encoder, decoder])

# add learning rate scheduling 
def exponential_decay_fn(epoch):
  return 0.001 * 0.1 ** (epoch / 10)

lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)

# add early stopping
early_stopping_cb = keras.callbacks.EarlyStopping(patience = 5, restore_best_weights = True)

# clear the session for a clean run
keras.backend.clear_session()
tf.random.set_seed(42)

# compile and run
dae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Nadam(),
            metrics=["mse"])
history = dae.fit(X_train, X_train, epochs = 10,
                  validation_data = (X_valid, X_valid),
                  callbacks = [lr_scheduler, early_stopping_cb])

Reconstruction of noisy images

image

Code to create encodings for the first 500 images in the training set and top model

X_train500_enc = encoder.predict(X_train[:500])
X_valid_enc = encoder.predict(X_valid)
X_test_enc = encoder.predict(X_test)

dae_model = keras.models.Sequential([
              keras.layers.Dense(64, activation = 'relu', input_shape = [128]),
              keras.layers.BatchNormalization(),
              keras.layers.Dropout(0.25),
              keras.layers.Dense(32, activation = 'relu'),
              keras.layers.BatchNormalization(),
              keras.layers.Dropout(0.25),
              keras.layers.Dense(10, activation = 'softmax')
])

Make predictions on the test set

dae_model.compile(loss = "categorical_crossentropy", optimizer = keras.optimizers.Nadam(),
                  metrics = ["accuracy"])
history = dae_model.fit(X_train500_enc, y_train[:500], epochs = 100,
                        validation_data = (X_valid_enc, y_valid),
                        callbacks = [lr_scheduler, early_stopping_cb])
dae_model.evaluate(X_test_enc, y_test)
313/313 [==============================] - 1s 3ms/step - loss: 0.8085 - accuracy: 0.7470

[0.8085425496101379, 0.746999979019165]

Same training and test, but using a non-AE approach

basicCNN = keras.models.Sequential([
           keras.layers.Conv2D(32, kernel_size = 3, padding = 'same', activation = 'relu', input_shape = [28, 28, 1]),
           keras.layers.MaxPool2D(),
           keras.layers.Flatten(),
           keras.layers.Dense(128),
           keras.layers.Dense(64, activation = 'relu'),
           keras.layers.BatchNormalization(),
           keras.layers.Dropout(0.25),
           keras.layers.Dense(32, activation = 'relu'),
           keras.layers.BatchNormalization(),
           keras.layers.Dropout(0.25),
           keras.layers.Dense(10, activation = 'softmax')
])

basicCNN.compile(loss = "categorical_crossentropy", optimizer = keras.optimizers.Nadam(),
                 metrics = ["accuracy"])
history = basicCNN.fit(X_train[:500], y_train[:500], epochs = 100,
                       validation_data = (X_valid, y_valid),
                       callbacks = [lr_scheduler, early_stopping_cb])
basicCNN.evaluate(X_test, y_test)
313/313 [==============================] - 1s 4ms/step - loss: 0.6636 - accuracy: 0.7960

[0.6636202931404114, 0.7960000038146973]
ageron commented 2 years ago

Hi @KirkDCO ,

Thanks for your feedback, it's really interesting. And congrats for your code and your interpretation, you're spot on. Many of the techniques I presented in the book work great in some situations, but not at all in others. Some techniques work best with large models and datasets, while others shine in the opposite regime. Some work with noisy data, others don't. Etc.

I believe your interpretation is right: since Fashion MNIST is a relatively simple dataset, a fairly small amount of data is sufficient to reach reasonable performance, so the autoencoder doesn't help. Since it compresses the data, it loses a bit information, and it actually hurts performance in this case. If you reduce the amount of data even more, you may find a point where it does help. Or if you consider a harder task.

When writing the book I looked for examples that could highlight the benefits of each technique, and it wasn't always easy because I wanted to stick to small or medium-sized datasets and models, to keep thinks fast and manageable for everyone. Some techniques really shine only on very large datasets.

In short, Machine Learning is very empirical: sometimes you just have to try things out to see if they'll work or not, nothing is guaranteed.

I realize that's not a very satisfying answer, but that's just reality. 😅 Hope this helps

KirkDCO commented 2 years ago

Thank you for your thoughtful response. And, to the contrary, that is a very satisfying answer! 8^D The beauty of machine learning (at least to me) is that it is an exploratory journey. Your book has been an incredible guide on that journey.