fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"
MIT License
18.61k stars 8.63k forks source link

5.2 Using convnets with small datasets #153

Open DLumi opened 3 years ago

DLumi commented 3 years ago

I quite literally copied the code from notebooks 37-38 but the training fails on 1st epoch 63/100 with an error saying something like: your input is out of data. So judging by the changes we made to the generator, the batch of 32 is too much and I managed to get everything working with a lesser batch of 20 we used in a previous example. Did I do something wrong or that's just an error in the code?

ghimireadarsh commented 3 years ago

step_per_epoch should be sample_size//batch_size

yeswzc commented 3 years ago

I have the same question. Why the book and this github can use steps_per_echo = 100, which is > smaple_size//batch_size. And showed a successful run? Is this because of tf version version difference? (Of course the pipeline works if I changed the steps_per_echo to 63.) When I was reading the book, it seems data augmentation can permute many more figures at each time and so the sample_size can be expanded. But now I am confused. Do anyone know this?

Thanks!

runzhi214 commented 3 years ago

I also have the same question. I tried to generate 10,000 pictures from the generator. It worked. But when applied to the fit and fit_generator method of the model. It failed at 63. It seems that the generator does not generate data permenantly.

pschdl1c commented 3 years ago

This change applies to new versions. There are several ways to work around this problem. using. repeat() , either create an data augmentation set of 3200 for train_gen and 1600 for val_gen (provided that in train_datagen.flow_from_directory (batch_size=32). It is a pity that the data augmentation works differently.

Calling the generator configured for data augmentation gives a non-repeatable result. I generated 1000 images with no duplicates. in this case, k is the number of identical images. my stupid algorithm compares each image with each one, so k should be equal to arr_pic. shape[0] (or the number of images)

import numpy as np

img_path = fnames[1]

img = image.load_img(img_path, target_size=(150, 150))
x = image.img_to_array(img)

x = x.reshape((1,) + x.shape)
i = 0

arr_pic = np.array([]).reshape((0, 150, 150, 3))
for batch in datagen.flow(x, batch_size=1):
  arr_pic = np.append(arr_pic, batch,axis=0)
  i += 1
  if i % 1000 == 0:
    break
print(arr_pic.shape)
k=0
new_arr = arr_pic.tolist()
for i in new_arr:
  for j in new_arr:
    if i == j:
      k+=1
k

This may mean that at the beginning of each epoch, when the generator runs through a set of images again, completely new images will be generated (there will be no duplicates in any epoch). this means that the data set is expanded from [train_count_pic] to [train_count_pic][epochs]. in our case, this is 2000 100 = 200,000 non-repeating images, respectively, for validation data, this is 1000 * 100 = 100,000 non-repeating images. I apologize for my English.