Reaching CER 0.052 and WER 0.175

AniketGurav commented 3 months ago

Hi georgeretsi,

Thanks for updated code I am following this repo from long time and new repo looks simple to run and maintian. However, when I run the code at line level I could get CER 0.052 and WER 0.175 in 800 epochs. I have created line level data as you have mentioned (using script prepare_iam.py). Following are value in config files and other important parameters. The weights provided by you gives expected result on test data but when I am training it Its getting stuck at above mentiined CER and WER values.

 conf: {'resume': None, 'save': './temp_16batchSize.pt', 'device': 'cuda:1', 'data': {'path': '/cluster/datastore/aniketag/allData/icpr2/output_path/'}, 'preproc': {'image_height': 128, 'image_width': 1024}, 'arch': {'cnn_cfg': [[2, 64], 'M', [3, 128], 'M', [2, 256]], 'head_type': 'both', 'rnn_type': 'lstm', 'rnn_layers': 3, 'rnn_hidden_size': 256, 'flattening': 'maxpool', 'stn': False}, 'train': {'lr': 0.001, 'num_epochs': 800, 'batch_size': 16, 'scheduler': 'mstep', 'save_every_k_epochs': 10, 'num_workers': 8}, 'eval': {'batch_size': 32, 'num_workers': 8, 'wer_mode': 'tokenizer'}}

Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) training lines 3876 Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) validation lines 613 Character classes: [' ', '!', '"', '#', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] (79 different characters) testing lines 1918 Preparing Net - Architectural elements: {'cnn_cfg': [[2, 64], 'M', [3, 128], 'M', [2, 256]], 'head_type': 'both', 'rnn_type': 'lstm', 'rnn_layers': 3, 'rnn_hidden_size': 256, 'flattening': 'maxpool', 'stn': False}

AM I MISSING SOMETHING?

After observation I found that my training code uses 3876 training lines. As mentioned in paper this work uses a split from reference [21] which contains 6161 training lines. Can it be root cause?

georgeretsi commented 3 months ago

Hi there! Thanks for your interest in my repo!

Considering the number of train/val/test lines that you report, I'm guessing that you are using a subset of the actual dataset. The whole dataset will provide these numbers:

training lines 6482 validation lines 976 testing lines 2915

One potential reason for this is that the official IAM repo has three different form folders (data/formsA-D.tgz data/formsE-H.tgz data/formsI-Z.tgz). You have to put all the images into a common folder without subfolders. By your reported numbers, it seems like one of these three sets is missing.

Hope I helped!

AniketGurav commented 3 months ago

Thanks for reply, I will check and update you.

georgeretsi / HTR-best-practices

Reaching CER 0.052 and WER 0.175 #2