Breta01 / handwriting-ocr

OCR software for recognition of handwritten text
MIT License
743 stars 240 forks source link

Handwritten Numbers? #88

Closed adrianmercado closed 5 years ago

adrianmercado commented 5 years ago

I've been testing different images using the OCR.ipynb notebook and anytime there are numbers they are not recognized. Does the character model not recognize numbers? Or am I doing something wrong. Thanks! Awesome work on this

Breta01 commented 5 years ago

Hi, the model which is currently in use was trained on dataset which contains mostly letters, so it fails on number recognition. However, there are good dataset containing handwritten numbers. I would recommend downloading one of them and training your own model if you want a good results.

adrianmercado commented 5 years ago

Okay, can you help point me in the right direction on training my own model.

I've downloaded the IAM datasets and ran the following

python data_extractor.py -d iam
python data_normalization -d iam
python data_create_sets --csv -d iam

I then tried to run char_classifier_CNN.ipynb and word_classifier_CTC.ipynb but I am running into problems with both.

In char_classifier_CNN

images, labels = load_chars_data(
    charloc='',
    wordloc='data/processed/breta/words_gaplines/',
    lang=LANG)

I've changed it to 'data/processed/iam/' but I do not have a words_gaplines folder, is that something that can be made from one of your scripts?

In word_classifier_CTC.ipynb

train_iterator = DataIterator(
    data['train'][0],
    data['train'][2],
    num_buckets,
    slider_size,
    augmentation=seq,
    dropout=dropout,
    train=True)
test_iterator = DataIterator(
    data['dev'][0],
    data['dev'][2],
    1,
    slider_size,
    train=False)

I am getting

NameError: name 'DataIterator' is not defined

Am I missing an import or something for the DataIterator?

Thanks

Breta01 commented 5 years ago

Ok, my mistake. I made separate file for the DataIterator in my local version. Quick fix is to rename DataIterator to BucketDataIterator. I will try to fix that asap.

The chra_classifier_CNN.ipynb will not work because it has to be trained on individual characters, which you don't have.

Breta01 commented 5 years ago

Please close the issue in case the fix from new commit works.

adrianmercado commented 5 years ago

I was able to train using the word_classifier_CTC.ipynb after fixed up a few more errors (basically there were some method names that weren't updated) anyway now I'm trying to use the OCR.ipynb which I have updated to use the new model but I am getting another error.

Below is my update OCR.ipynb

import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2

from ocr.normalization import word_normalization, letter_normalization
from ocr import page, words, characters
from ocr.helpers import implt, resize, img_extend
from ocr.tfhelpers import Model
from ocr.datahelpers import idx2char

#%matplotlib inline
plt.rcParams['figure.figsize'] = (15.0, 10.0)

# Settings
IMG = 'test5'    # 1, 2, 3
LANG = 'en'      # cz, en
MODEL_LOC = '../models/word-clas/CTC/Classifier1-2000'

wordClass = Model(MODEL_LOC, 'word_prediction')

image = cv2.cvtColor(cv2.imread("../data/pages/%s.jpg" % IMG, 0), cv2.COLOR_GRAY2RGB)
implt(image)

crop = page.detection(image)
implt(crop)
boxes = words.detection(crop)
lines = words.sort_words(boxes)

def recognise(img):
    """ Recognising word and printing it """
    img = word_normalization(
        img,
        60,
        border=False,
        tilt=True,
        hyst_norm=True)

    # Set those parameter depending on what you trained
    slider = (60, 60)
    step = 10

    img = cv2.copyMakeBorder(
        img,
        0, 0, slider[1]//2, slider[1]//2,
        cv2.BORDER_CONSTANT,
        value=[0, 0, 0])
    img = img_extend(
        img,
        (img.shape[0], max(-(-img.shape[1] // step) * step, slider[1] + step)))
    length = (img.shape[1]-slider[1]) // step
    input_seq = np.zeros((1, length, slider[0] * slider[1]), dtype=np.float32)
    input_seq[0][:] = [img[:, loc*step: loc*step + slider[1]].flatten()
                       for loc in range(length)]
    input_seq = input_seq.swapaxes(0, 1)

    pred = wordClass.eval_feed({
        'inputs:0': input_seq,
        'inputs_length:0': [length],
        'keep_prob:0': 1})[0]

    word = ''
    for i in pred:
        if word == 0 and i != 0:
            break
        else:
            word += idx2char(i)
    print("Word: " + word)
    return word

implt(crop)
for line in lines:
    print(line)
    print(" ".join([recognise(crop[y1:y2, x1:x2]) for (x1, y1, x2, y2) in line]))

I am getting the error

  File "/mnt/c/Users/adrian.mercado/Downloads/handwriting-ocr-master/handwriting-ocr-master/src/ocr/tfhelpers.py", line 32, in eval_feed
    return self.sess.run(self.op, feed_dict=feed)
  File "/home/amerc/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/amerc/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (12, 1, 3600) for Tensor 'inputs:0', which has shape '(?, 64, ?, 1)'

any feedback would be appreciated. I've tried to find similar errors online with no luck.

Breta01 commented 5 years ago

I take the blame. I just found even more mistakes in the CTC code. Anyway, the current CTC code needs data in shape (batch_size, height, width, 1). In new CTC code I also changed the height from 60px to 64px. This should do the job:

def recognise(img, correct):
    """Recognising words."""
    img = word_normalization(
        img,
        64,
        border=False,
        tilt=False,
        hyst_norm=False)
    length = img.shape[1]
    # Input has shape [batch_size, height, width, 1]
    input_imgs = np.zeros(
            (1, 64, length, 1), dtype=np.uint8)
    input_imgs[0][:, :length, 0] = img

    pred = WORD_MODEL.eval_feed({
        'inputs:0': input_imgs,
        'inputs_length:0': [length],
        'keep_prob:0': 1})[0]

    word = ''
    for i in pred:
        word += idx2char(i)
    return word

I am thinking about editing the OCR.ipynb, so that it will be easy to switch between different models. I would really appreciate if you would take the time and create pull request with the fixes and changes you had to make.

adrianmercado commented 5 years ago

Thanks! I was able to use the model I created with the code. Although the results aren't what I was hoping for. I was hoping the IAM database would recognize digits better or any for that matter.

Do you have a recommendation on a dataset to train for numbers/digits that will work with your existing scripts?

I will create a pull request for my changes. I actually copied your scripts into python files because I am not familiar with jupyter. I will move my changes over to the ipynb files when I have a little more time.

Breta01 commented 5 years ago

Great! For the digits, the ORAND dataset was good, but the official site for it seems to be down (I don't know what's the official status. But if you search hard enough you can still find it.)

You can also try to look into NIST SD19 (HSF-7 or something) should contain digit strings. I personally didn't used this dataset.

I also just found this dataset CVL digit strings. I have CVL dataset processing script, this might be only extraction of digits from it.

You can also try to synthesis the data from dataset like MNIST, for example: https://github.com/thomalm/svhn-multi-digit/blob/master/04-mnist-synthetic-model.ipynb

Breta01 commented 5 years ago

I consider this closed. I upload some of the changes. Feel free to still upload the pull request if I missed something.