githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 893 forks source link

Model size 800*64 ? (Trained on word or line images) #104

Closed jjsr closed 3 years ago

jjsr commented 3 years ago

Hello sir , Thanks for sharing the repo with us . may you please let me know , that you trained on word images or line images (of IAM) if Model size is 800*64(Used in your thesis )(FAQ article)?? This would be great help. I tried to train on word images even after 24 Hours Word Accuracy is 1% so i thought i should try to train over line images. Kindly guide

githubharald commented 3 years ago

The model from the thesis was trained on line images. But it should also work with word images, so I guess there is something wrong in the implementation.

jjsr commented 3 years ago

Thanks for writing back . I am always amazed the needful guidance provide by you sir . I will try me best to figure out the issue with my model . Thanks . I will ask your permission to keep this issue open for a couple of days .. thanks again

jjsr commented 3 years ago

Hi sorry for the getting back in the late in the thread , I am seriously want to try to solve the issue I was facing by myself so that why I am little bit hesitant to ask sorry sir for troubling you, Main Objective - To train model on line level . Steps followed -1) Image size 128*32 is not sufficient cause its only able to predict upto 32 characters at max .As per your suggestion and following the table SimpleHTR I converted the code of setupCNN to iterative one able to successfully upto the steps - (100,8,512) of the last row of table . Now this output needs to be fed into LSTM(Dont know how to apply MDLSTM ) so i must get rid of (100,8,512) to (100,1,512) , I am stuck at this step I have tried the following approaches-

Approach-1) by looking into looped code (128,32)---(32) that 2 power 5 ---> 5 CNN layers , hence for (800,64)-->64 -- 2 power 6 - so at each layer (conv2d, convolutional_normal,relu,pool) for 6 layers I tried to run this model but not able to get any word accuracy even after a day of running.

Approach-2) I followed the table upto step (100,8,512) then i add pair of CONV and Pool to reduce it upto (100,1,512) but still my model does not produce anything .. I am stuck at this point(100,8,512) to (100,1,512) ,May i send the code at your mail instead of posting it here ? Kindly please let me know the direction . Thanking in advance

Note- In createLMDB i have to increase size from 2 to 4 GB and modified the DataloaderCode little bit to work at line of images

githubharald commented 3 years ago
  1. Look at LineHTR, the author of that repo already converted this repo to 800x64 input size
  2. Look at table 3.2 in my diploma thesis (link is in README), there you can find how the architecture looks like for "normal" LSTM instead of MD-LSTM
jjsr commented 3 years ago

Thank you so much sir .

jjsr commented 3 years ago

Hello sir , I have implemented the line level code by the guidance given by you sir , sir may you please let me know , how have you find out the CER and WordAccuracy in line level of your thesis. Pseudo algorithm will be greatly help sir. numCharErr = 0 numCharTotal = 0 numWordOK = 0 numWordTotal = 0 while loader.hasNext(): iterInfo = loader.getIteratorInfo() print(f'Batch: {iterInfo[0]} / {iterInfo[1]}') batch = loader.getNext() (recognized, _) = model.inferBatch(batch)

  `  print('Ground truth -> Recognized')
    for i in range(len(recognized)):
        numWordOK += 1 if batch.gtTexts[i] == recognized[i] else 0
        numWordTotal += 1
        dist = editdistance.eval(recognized[i], batch.gtTexts[i])
        numCharErr += dist
        numCharTotal += len(batch.gtTexts[i])
        print('[OK]' if dist == 0 else '[ERR:%d]' % dist, '"' + batch.gtTexts[i] + '"', '->',
              '"' + recognized[i] + '"')

# print validation result
charErrorRate = numCharErr / numCharTotal
wordAccuracy = numWordOK / numWordTotal
print(f'Character error rate: {charErrorRate * 100.0}%. Word accuracy: {wordAccuracy * 100.0}%.')
return charErrorRate, wordAccuracy

`

What needs to be changed in this part ,, Thanks in advance || I have tried but getting the out of index error ... If its fine i may post how i am approaching the CER and Word Accuracy ??

githubharald commented 3 years ago

closing because this is not an issue with the SimpleHTR repo (sorry, no time to provide personal guidance with code).