Open AniketGurav opened 3 months ago
Hi there! Thanks for your interest in my repo!
Considering the number of train/val/test lines that you report, I'm guessing that you are using a subset of the actual dataset. The whole dataset will provide these numbers:
training lines 6482 validation lines 976 testing lines 2915
One potential reason for this is that the official IAM repo has three different form folders (data/formsA-D.tgz data/formsE-H.tgz data/formsI-Z.tgz). You have to put all the images into a common folder without subfolders. By your reported numbers, it seems like one of these three sets is missing.
Hope I helped!
Thanks for reply, I will check and update you.
Hi georgeretsi,
Thanks for updated code I am following this repo from long time and new repo looks simple to run and maintian. However, when I run the code at line level I could get CER 0.052 and WER 0.175 in 800 epochs. I have created line level data as you have mentioned (using script prepare_iam.py). Following are value in config files and other important parameters. The weights provided by you gives expected result on test data but when I am training it Its getting stuck at above mentiined CER and WER values.
Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) training lines 3876 Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) validation lines 613 Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) testing lines 1918 Preparing Net - Architectural elements: {'cnn_cfg': [[2, 64], 'M', [3, 128], 'M', [2, 256]], 'head_type': 'both', 'rnn_type': 'lstm', 'rnn_layers': 3, 'rnn_hidden_size': 256, 'flattening': 'maxpool', 'stn': False}
AM I MISSING SOMETHING?
After observation I found that my training code uses 3876 training lines. As mentioned in paper this work uses a split from reference [21] which contains 6161 training lines. Can it be root cause?