recognizer possible bug

faustomorales / keras-ocr

A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

MIT License

1.38k stars 355 forks source link

Hello,

Let me start by saying thank you for this great pipeline.

I have noticed something strange in the get_batch_generator function of the recognizer. If your batch size is, for example, 2 the image_generator gets called 3 times. I believe that this line causes the problem:

https://github.com/faustomorales/keras-ocr/blob/71fbec8c163ae035dfb89a8b936ac48385bb7482/keras_ocr/recognition.py#L362

I have also created a toy example:

import random

def gen():
    while True:
        print("Generator got called")
        yield random.random()

r_gen = gen()
batch_size = 2

b = [sample for sample, _ in zip(r_gen, range(batch_size))]
print(b)

print("=" * 100)

b = [next(r_gen) for n in range(batch_size)]
print(b)

The output is:

Generator got called
Generator got called
Generator got called
[0.4160141123512153, 0.8948171240884449]
===================================
Generator got called
Generator got called
[0.8689812892217589, 0.13292716281754136]

I am not sure if this is a bug. In any case, I wanted to ask you if this is the expected behavior. Maybe the second approach (without the zip) is the correct one?

Thank you again!

Epoch 1: Step 1/2: batch = [img1, img2]. But now the image_generator gets called one more time so it yields img3 as well. Step 2/2: batch = [img4, img1]. Again image_generator gets called one more time and it yields img2.

faustomorales / keras-ocr

recognizer possible bug #149