faustomorales / keras-ocr

A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
https://keras-ocr.readthedocs.io/
MIT License
1.39k stars 360 forks source link

Example code issue #114

Closed NeighborhoodCoding closed 4 years ago

NeighborhoodCoding commented 4 years ago

Hi

I'm just using your https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html

Example of end_to_end_training

And your provided colab example of https://colab.research.google.com/drive/19dGKong-LraUG3wYlJuPCquemJ13NN8R

There is some issue of(some mistypo(?))

  1. In CELL [7]
detector = keras_ocr.detection.Detector(weights='clovaai_general')
recognizer = keras_ocr.recognition.Recognizer
    alphabet=recognizer_alphabet,
    weights='kurapan',
    include_top=False
)

should be

detector = keras_ocr.detection.Detector(weights='clovaai_general')
recognizer = keras_ocr.recognition.Recognizer(
    alphabet=recognizer_alphabet,
    weights='kurapan'
)
  1. In cell [13]
pipeline = keras_ocr.pipelines.Pipeline(detector=detector, recognizer=recognizer)
image, lines = next(image_generators[0])
predictions = pipeline.recognize(images=[image])[0]
drawn = keras_ocr.tools.drawBoxes(
    image=image, boxes=predictions, boxes_format='predictions'
)
print(
    'Actual:', '\n'.join([' '.join([character for _, character in line]) for line in lines]),
    'Predicted:', [text for text, box in predictions])
plt.imshow(drawn)

should be

pipeline = keras_ocr.pipeline.Pipeline(detector=detector, recognizer=recognizer)
image, lines = next(image_generators[0])
predictions = pipeline.recognize(images=[image])[0]
drawn = keras_ocr.tools.drawBoxes(
    image=image, boxes=predictions, boxes_format='predictions'
)
print(
    'Actual:', '\n'.join([' '.join([character for _, character in line]) for line in lines]),
    'Predicted:', [text for text, box in predictions])
plt.imshow(drawn)

thanks.

NeighborhoodCoding commented 4 years ago

I succeeded train a English character recognizer from scratch without pre-trained weights!!! It is possible just train a large epoch and large step per epoch.

But when my len(alphabet) >= 800 (this is a only problem for Korean OCR, for e.g, alphabet = '가갸거겨...') The predicted is tend to ''? here is my image.... https://i.imgur.com/jWbxytB.png

should I try this? https://github.com/faustomorales/keras-ocr/issues/88

(If you are ok, I will make my COLAB code and share to open... I think it is some of korean or chinese alphabet issue that need more accurate loss function?)

Please help.... I really want to predict korean characters.... I can not 100% understand your code(I'm begineer)..... please help T.T...

NeighborhoodCoding commented 4 years ago

Hi, It is being solved when I train large epoch.... thank you!