awslabs / handwritten-text-recognition-for-apache-mxnet

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
Apache License 2.0
488 stars 189 forks source link

Train on IAMDataset "Word" Crashes the code #14

Closed srikar2097 closed 4 years ago

srikar2097 commented 5 years ago

The 3_handwriting_recognition.py works fine with IAMDataset("line", output_data="text", train=True) but crashes when using the word IAMDataset. Specifically, doing this crashes the code.

train_ds = IAMDataset("word", output_data="text", train=True)
print("Number of training samples: {}".format(len(train_ds)))

test_ds = IAMDataset("word", output_data="text", train=False)
print("Number of testing samples: {}".format(len(test_ds)))

Gives: mxnet.base.MXNetError: Shape inconsistent, Provided = [13320192], inferred shape=(8863744,)

jonomon commented 5 years ago

Hi Srikar2097,

The error you see is a result of the change in the input image size between words and lines. The network was only trained on lines of handwritten text. You will have to tweak the parameters of the networks to accommodate the smaller image. As a result of the tweaking, you will not be able to use the pre-trained network (skip the following lines).

pretrained = "models/handwriting_line8.params"
if (os.path.isfile(pretrained)):
    net.load_parameters(pretrained, ctx=ctx)
    print("Parameters loaded")
    print(run_epoch(0, net, test_data, None, log_dir, print_name="pretrained", is_train=False))
srikar2097 commented 5 years ago

@jonomon thanks for your reply but even with this change the code crashes. I think the architecture (the LSTM) module assumes certain dimensions and these dimensions are not suitable for word training data. What changes do you propose?

I have changed the word resize dims and reduced the number of BiLSTM layers to 1 to make it work. Since words are lot smaller than lines, since layer of BiLSTM is probably sufficient. What do you think?

jonomon commented 5 years ago

Hi srikar2097,

You are correct, the LSTM assumes certain dimensions. If you look at the size of features in CNNBiLSTM.hybrid_forward, it is 32x256x2x9 for words on my machine. max_seq_len must be divisible to 256x2x9. I chose max_seq_len = 96 and it trained for me.

Please let me know if this works for you.

vdinesh18 commented 4 years ago

Hello Jonomon, I have a need to train the model with words. I tried by updating the code with max_seq_len =96 but the it crashes with error

DeferredInitializationError: Parameter 'cnnbilstm0_hybridsequential1_hybridsequential0_encoderlayer0_lstm0_l0_i2h_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

FEATURE_EXTRACTOR_FILTER = 64
def __init__(self, num_downsamples=2, resnet_layer_id=4, rnn_hidden_states=200, rnn_layers=1, max_seq_len=96, ctx=mx.gpu(0), **kwargs):
vdinesh18 commented 4 years ago

Can you please let me know what all changes you made to make it work?

vdinesh18 commented 4 years ago

Hey Jonh, I was able to figure out. . size of features in CNNBiLSTM.hybrid_forward was a problem and it worked for me when i set max_seq_len =64

jonomon commented 4 years ago

Great :)