githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 894 forks source link

Regarding training with different dataset #71

Closed Tejas111 closed 5 years ago

Tejas111 commented 5 years ago

I have the words.txt in the following format:

able a0
able a1
able a2
able a3
abolition a4
about a5
about a6

so, I changed the dataloader.py as :

                        lineSplit = line.strip().split(' ')
            assert len(lineSplit) >= 2

            # filename: a0.png
            fileNameSplit = lineSplit[1]
            fileName = fileNameSplit+ '.png'

            # GT text are columns starting at 1
            gtText = self.truncateLabel(' '.join(lineSplit[0]), maxTextLen)
            print(gtText)
            chars = chars.union(set(list(gtText)))

but when i train with these changes I get the following output:

Validate NN
Batch: 1 / 5
Ground truth -> Recognized
[ERR:6] "w e p t" -> " "
[ERR:6] "w e r e" -> " "
[ERR:6] "w e r e" -> " "
[ERR:6] "w e r e" -> " "
[ERR:6] "w e r e" -> " "
[ERR:6] "w e r e" -> " "
[ERR:6] "w e r e" -> " "

Can u tell me where I went wrong and what changes to be made?

githubharald commented 5 years ago

sorry but I don't have the time to help with general coding questions.

Atralb commented 5 years ago

@githubharald Wow man this is kinda rude, you can let the issue open, so that other people will answer. I don't see why you need to close every thread...

Aakash12980 commented 3 years ago

@githubharald Wow man this is kinda rude, you can let the issue open, so that other people will answer. I don't see why you need to close every thread...

I am also training my custom dataset and faced a similar problem as yours. @Atralb If it is resolved, could you tell me how did you fix the problem?