SergioQuijanoRey / TFG

Trabajo Fin de Grado
1 stars 0 forks source link

Add data augmentation to LFW dataset #28

Closed SergioQuijanoRey closed 2 years ago

SergioQuijanoRey commented 2 years ago

As mentioned in #27, the problem is that many classes have only one image. So this don't work with the fact that we're using a contrastive loss function.

As only 15% of the images has at least 3 images, data augmentation seems to be the best solution to the problem

SergioQuijanoRey commented 2 years ago

With class AugmentedDataset we can think that the problem is solved. However, new problems arise:

  1. Generation of the augmented dataset is very slow. However, we have a cache mechanism which makes this less of a problem
  2. Holding this dataset object, because the way we implemented it, takes too much RAM.
    • This amount of RAM is a problem when using my local computer or Google Colab
    • This is the main problem, because it causes crashes

Two possible solutions:

  1. Save images to files, and load them the way that pytorch usually does it
  2. Make another class that would do the same, but instead of generating the dataset upfront, generate new images on demand (on a lazy way)

Option (1.) is hard to do the right way. So we're going to try to do the second option