ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.69k stars 12.7k forks source link

[Chapter 17] Question : Does an autoencoder based on advanced CNN such as SE-ResNET make any sense ? #387

Open lebaste77 opened 3 years ago

lebaste77 commented 3 years ago

Hi,

I was going to work on exercise 9 from Chapter 17 (denoising autoencoder), and wanted to try using the best classifier I had trained so far on MNIST digits which is an SE-ResNET, as a basis for the encoder.

Here are my questions :

Thanks for your answers/insights.

After posting this question I found two part-answers but without explanation on why is it good or bad :

lebaste77 commented 3 years ago

Just so you know, with a Dense only decoder below,

denoising_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])

I get this not so bad(?) result : image

It seems that using Conv2DTransposed does not make better results than FC Dense decoder. How it can be explained ?

denoising_decoder = keras.models.Sequential([
    keras.layers.Dense(14*14, activation="relu", input_shape=[30]),
    keras.layers.Reshape([14, 14, 1]),
    keras.layers.Conv2DTranspose(filters=1, kernel_size=3, strides=2,
                                 padding="same", activation="sigmoid")
])

image

ageron commented 3 years ago

Thanks for your interesting question.

The optimal model always depends on the dataset (that's basically the conclusion of the "no free lunch" theorem). For example, there will be cases where a more powerful model will help (e.g., typically when the images are complex and you have a lot of training data), and others where a simpler model will be better. For example, MNIST is so simple that a Dense network can work just as well (or sometimes even better) than a more complex model.

So it's hard to make general statements, as the answer is usually empirical: "give it a try, and you will see".

That said, there are some general rules that tend to work quite often. For example, ConvNets generally work much better than Dense networks for images. Indeed, ConvNet makes some implicit assumptions about the images, such as the fact that neighboring pixels are more correlated than distant pixels, and these assumptions usual hold in real life images.

So, to answer your question, I believe it should be quite possible to make a good autoencoder using a SE-ResNET as the encoder. Of course be careful to avoid having skip connections skip the bottleneck layer (if you are building an auto-encoder with a bottleneck). But the final performance will really depend on the dataset.

Hope this answers your question.

lebaste77 commented 3 years ago

Yes, That answers my question. Thank you.

I was mainly interested in the impact on the decoder part only (due to a SE-ResNet-like encoder), but your answer applies also. More generally, I was interested in the impact of having a asymmetric autoencoder (with a simpler decoder than encoder), and if there have been some 'proven' work if there is a general impact or not. If you know some published work about that, I would be interested. I did not found much publication nor factful results on this topic, only some experiments and opinions : https://www.reddit.com/r/MachineLearning/comments/ef1xe8/d_should_autoencoders_really_be_symmetric/ and strangely only 3 arxiv papers with these keywords in the title :

ageron commented 3 years ago

Thanks for your feedback and the interesting ideas and links. Sorry, I'm not aware of additional work on asymmetric autoencoders: it seems like an interesting thing to dig into!