awslabs / handwritten-text-recognition-for-apache-mxnet

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
Apache License 2.0
488 stars 189 forks source link

Questions about train and evaluation #6

Open Sundrops opened 5 years ago

Sundrops commented 5 years ago

Thanks for your great work. I am a rookie on handwriting recognition and have some questions about train and evaluation.

  1. This repo uses SCLITE for WER evaluation. I found that it will ignore space between words when SCLITE evaluates words of one line. But other mothods such as https://github.com/githubharald/SimpleHTR/blob/master/src/main.py#L81, https://github.com/jpuigcerver/xer/blob/master/xer#L116, are not like this. Which is the criterion in general?
  2. why 100.0 - float(er)? I think it's float(er)
    for line in output_file.readlines():
            match = re.match(match_tar, line.decode('utf-8'), re.M|re.I)
            if match:
               # I think there are matching problems
                number = match.group(1)    #  --> match.group().split()[4]
                er = match.group(2)  # --> match.group().split()[-3]
        assert number != None and er != None, "Error in parsing output."
        return float(number), 100.0 - float(er)  #  return float(number), float(er)
  3. It's average cer of all lines, not global cer.
    # https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/0_handwriting_ocr.ipynb
    def get_qualitative_results_lines(denoise_func):
    sclite.clear()
    test_ds_line = IAMDataset("line", train=False)
    for i in tqdm(range(1, len(test_ds_line))):
       # ....
        sclite.add_text([decoded_text], [actual_text])
    cer, er = sclite.get_cer()
    print("Mean CER = {}".format(cer))
    return cer
  4. The pretrained model handwriting_line8.params works well. But I can't train such a good model.
    # https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/handwriting_line_recognition.py#L30
    # Best results:
    # python handwriting_line_recognition.py --epochs 251 -n handwriting_line.params -g 0 -l 0.0001 -x 0.1 -y 0.1 -j 0.15 -k 0.15 -p 0.75 -o 2 -a 128

    Looking forward to your reply. Thanks a lot.

jonomon commented 5 years ago

Hi @Sundrops,

1) We used the SCLITE package for CER instead of WER. 2) If my memory serves me, the SCLITE output provides the "correct percentage". That's why 100.0 was subtracted 3) That is correct. Calculating the global CER was too computationally expensive. 4) Please provide more details. Did you use 0_handwriting_ocr.ipynb?

Sundrops commented 5 years ago

@jonomon Thanks for your reply.

  1. Oops! It's CER. But I mean that SCLITE ignores space between words when evaluates words of one line while other methods not. And I can not know which CER is used in many papers.
  2. Maybe the version of my SCTK is different from yours. The er = match.group(2) is wrong using latest SCTK. The correct answer should be cer = match.group().split()[-3]
  3. I think the CER reported by many papers should be global CER. So it's better to be same as theirs.
  4. I get a better CER using the0_handwriting_ocr.ipynb (your provided model handwriting_line8.params). But I train model using handwriting_line_recognition.py and can not get the same result as handwriting_line8.params. Maybe your comment is wrong. Thanks for your great work again.
jonomon commented 5 years ago

The SCLITE was chosen because I believe it accounts capitalisations etc.

@ThomasDelteil could you answer question 4?

Sundrops commented 5 years ago

I can't reproduce your results(Mean CER: 8.4 obtained by handwriting_line8.params). Can you provide more details about your training? @ThomasDelteil

man0007 commented 5 years ago

Same request for the question 4.) I have used the script "handwriting_line_recognition.py" to train the model, but the weights obtained is of only 18mb and the predicted accuracy is not good. Where as the pre-trained weights is around 90mb. could you please provide the model in which you have trained. It would be of great help!

NidhiSultan commented 5 years ago

@Sundrops Hi, I have been trying to understand the architecture for this implementation but somehow whenever 1) I run the code even ocr.ipynb or handwritten_line.py, I get stuck after downloading largeWriterIndependentTextLineRecognitionTask.zip: . 2) I wanted to know, is this happening because of the high memory requirement for the entire pipeline or there are any code changes which needs to be incorporated? 3)I am using mxnet-cu82 with cuda 8, if this isn't what's suggested I'll try working on a version 9 for both. Any help would be appreciated.

Thanks in advance.

Sundrops commented 5 years ago

@NidhiSultan

  1. Debug with breakpoints.
  2. You can use a light backbone to extract features(no resnet, no downsample). I used following code and got a better results. (code borrowed from https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.py#L23)
    body = nn.HybridSequential()
    with body.name_scope():
    # conv1
    body.add(gluon.nn.Conv2D(channels=64, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=True))
    body.add(nn.Activation('relu'))
    body.add(nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
    # conv2
    body.add(gluon.nn.Conv2D(channels=128, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=True))
    body.add(nn.Activation('relu'))
    body.add(nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
    # conv3_1
    body.add(gluon.nn.Conv2D(channels=256, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=False))
    body.add(nn.BatchNorm())
    body.add(nn.Activation('relu'))
    # conv3_2
    body.add(gluon.nn.Conv2D(channels=256, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=True))
    body.add(nn.Activation('relu'))
    body.add(nn.MaxPool2D(pool_size=(2, 2), strides=(2, 1), padding=(0, 1)))
    # conv4_1
    body.add(gluon.nn.Conv2D(channels=512, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=False))
    body.add(nn.BatchNorm())
    body.add(nn.Activation('relu'))
    # conv4_2
    body.add(gluon.nn.Conv2D(channels=512, kernel_size=(3, 3), padding=(1, 1), strides=(1, 1), use_bias=False))
    body.add(nn.BatchNorm())
    body.add(nn.Activation('relu'))
    body.add(nn.MaxPool2D(pool_size=(2, 2), strides=(2, 1), padding=(0, 1)))
    # conv5
    body.add(gluon.nn.Conv2D(channels=512, kernel_size=(2, 2), padding=(0, 0), strides=(1, 1), use_bias=False))
    body.add(nn.BatchNorm())
    body.add(nn.Activation('relu'))
    body.initialize(mx.init.MSRAPrelu(), ctx=ctx)
NidhiSultan commented 5 years ago

@Sundrops, Thanks for the prompt reply. Will try to run it this way as well.

JayeshGridScape commented 5 years ago

hi @Sundrops @jonomon @NidhiSultan I have an issue in setup this codebase can you specify please which version of python and other dependencies were used to configure this codebase.

NidhiSultan commented 5 years ago

Hi @JayeshGridScape, package versions solely depends on MXnet and Cuda version in your system. You'll have to research a bit on these 2 and other packages compatible with them. I used mxnet 8 with cuda 8 though that's outdated now.

NidhiSultan commented 5 years ago

Thanks for your great work. I am a rookie on handwriting recognition and have some questions about train and evaluation.

  1. This repo uses SCLITE for WER evaluation. I found that it will ignore space between words when SCLITE evaluates words of one line. But other mothods such as https://github.com/githubharald/SimpleHTR/blob/master/src/main.py#L81, https://github.com/jpuigcerver/xer/blob/master/xer#L116, are not like this. Which is the criterion in general?
  2. why 100.0 - float(er)? I think it's float(er)
   for line in output_file.readlines():
            match = re.match(match_tar, line.decode('utf-8'), re.M|re.I)
            if match:
               # I think there are matching problems
                number = match.group(1)    #  --> match.group().split()[4]
                er = match.group(2)  # --> match.group().split()[-3]
        assert number != None and er != None, "Error in parsing output."
        return float(number), 100.0 - float(er)  #  return float(number), float(er)
  1. It's average cer of all lines, not global cer.
# https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/0_handwriting_ocr.ipynb
def get_qualitative_results_lines(denoise_func):
    sclite.clear()
    test_ds_line = IAMDataset("line", train=False)
    for i in tqdm(range(1, len(test_ds_line))):
       # ....
        sclite.add_text([decoded_text], [actual_text])
    cer, er = sclite.get_cer()
    print("Mean CER = {}".format(cer))
    return cer
  1. The pretrained model handwriting_line8.params works well. But I can't train such a good model.
# https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/handwriting_line_recognition.py#L30
# Best results:
# python handwriting_line_recognition.py --epochs 251 -n handwriting_line.params -g 0 -l 0.0001 -x 0.1 -y 0.1 -j 0.15 -k 0.15 -p 0.75 -o 2 -a 128

Looking forward to your reply. Thanks a lot.

  1. @Sundrops, I got the same error for SCTK. Tried changing as you have mentioned but it still throws the same error. Can you provide me any documentation link or the source where I can read more on SCTK?
    1. What I understood about SCTK is that it's a regex function which calculates the difference between predicted and actual text.
  2. Though, How will we get the actual text if I use any random data to detect handwritten text.
  3. Also, Can you guide me on how to test this code on raw dataset instead of IAMDataset?

Thanks !!

Sundrops commented 5 years ago

@NidhiSultan

  1. You can get more information about SCTK from http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm
  2. Yes, SCTK calculates Levenshtein distance between predicted and actual text.
  3. You must have the actual text, or you can't train the model.
  4. You should prepare images and corresponding text. Then write your own dataloader.
mahin003 commented 4 years ago

If anybody executed it on Google colab ,please share the edited iam_dataset.py it with me , mahinqureship1@gmail.com

Soorya-suresh commented 3 years ago

i couldn't find the Genbits file in hackerearth server. How can be the Genbits zip file can dowlnload?