Open VasilisStavrianoudakis opened 3 years ago
I realized that I did not provide further info as to why this may be a bug.
Let's say that you have 4 training data: [img1, img2, img3, img4]
, a batch_size=2
and epochs=2
. Your steps_per_epoch = len(training_data) / batch_size = 2
.
Epoch 1:
Step 1/2:
batch = [img1, img2]. But now the image_generator gets called one more time so it yields img3 as well.
Step 2/2:
batch = [img4, img1]. Again image_generator gets called one more time and it yields img2.
Because the image_generator yielded img2 as the last image, the Epoch 2 now starts with:
Epoch 2:
Step 1/2:
batch = [img3, img4]. Yields one more image -> img1
Step 2/2:
batch = [img2, img3]
The main problem is that during one epoch the model may not see all the available data. The other problem is that each batch does not contain the same data across all epochs.
Did I miss something?
Hello,
Let me start by saying thank you for this great pipeline.
I have noticed something strange in the get_batch_generator function of the recognizer. If your batch size is, for example, 2 the image_generator gets called 3 times. I believe that this line causes the problem:
https://github.com/faustomorales/keras-ocr/blob/71fbec8c163ae035dfb89a8b936ac48385bb7482/keras_ocr/recognition.py#L362
I have also created a toy example:
The output is:
I am not sure if this is a bug. In any case, I wanted to ask you if this is the expected behavior. Maybe the second approach (without the zip) is the correct one?
Thank you again!