anderzzz / monkey_caput

Custom PyTorch model (VGG-16 Auto-Encoder) and custom criterion (Local Aggregation) for image clustering. The repo contains elaborated creation of fungi image data using factory method.
38 stars 17 forks source link

how to use autoencoder for unsupervised classification? #6

Open lkpopo opened 1 year ago

lkpopo commented 1 year ago

I have looked at the source code, and it seems that your image clustering is supervised. Where is autoencoder used, and how can I use autoencoder for unsupervised classification?

anderzzz commented 1 year ago

AEs have been used by themselves for image clustering. The idea is:

  1. The AE discovers the most efficient way to create a compact vector representation of the image data that is used to train the AE.
  2. Two such vector representations that are close, according to some metric, should correspond to somewhat similar images.
  3. Hence, cluster the vector representations produced by the encoder, and the images are clustered.

However, it has been found that this leads to dubious clusterings. The reason given is that the mapping from image to compact vector is highly non-linear, so clustered vectors can correspond to rather different images. The Local Aggregation method I have implemented tries to solve this by imposing an inherent cluster quality objective on the compact vectors.

The encoder part of the AE is retrieved here: https://github.com/anderzzz/monkey_caput/blob/master/la_learner.py#L74 The point is that the AE has already been optimized before this. So the LA learning is best understood as a form of fine-tuning. So the steps of LA are:

  1. Train a standard AE, such as the VGG variety I use, though other models are possible. When done, the encoder should in theory be able to detect and compactly represent key image features in the image training data (in my case I use images of mushrooms).
  2. Discard the decoder part of the AE, and fine-tune the encoder, such that the compact vectors ("the code") it generates are inherently suitable for clustering.

If this actually works or not is what I explored. The results were ok, but certainly not perfect. The question is how much fine-tuning is appropriate. Maybe you already have looked at what I wrote about this?

In short, general image clustering is hard because there is often no objective criteria what it means for clustering to be good. This is especially hard in that there are so many ways we can consider a pair of images as similar or dissimilar. A method purely based on AE, however, is not considered adequate in most papers I have read to date.

lkpopo commented 1 year ago

Thank you for your patient and detailed answer, which helped me a lot.