Closed nunomrm closed 11 months ago
don't use --decoder wordbeamsearch
for training, further, don't expect any meaningful results in the first few epochs.
I used without that decoder option beforehand as well, and the results would be the same even in last epochs (just identifying a character only even in the final epochs). I will run again without that option, just for double checking, but I'm certain it will not train well. I'll update here.
yes, give it a try. Just checked a training log, it should look roughly like this:
"charErrorRates": [ 0.9838042269187987, 0.8809788654060067, 0.5203559510567297, 0.33205784204671857, 0.29054505005561737, 0.2439599555061179, 0.2181979977753059, 0.20262513904338153, 0.18593993325917688, 0.18740823136818688, ...
So you should get some reasonable readouts after ~10 epochs of training.
Check if the data is correct that is fed to the model, set a breakpoint here and look at the texts and the images of the first few batch elements: https://github.com/githubharald/SimpleHTR/blob/master/src/dataloader_iam.py#L134
After 10 epochs (not using --decoder wordbeamsearch
in training) I get this:
(...)
Batch: 24 / 24
Ground truth -> Recognized
[ERR:6] "school" -> "a"
[ERR:1] "." -> "a"
[ERR:3] "Did" -> "a"
[ERR:3] "you" -> "a"
[ERR:6] "notice" -> "a"
[ERR:3] "that" -> "a"
[ERR:4] "girl" -> "a"
[ERR:3] "who" -> "a"
[ERR:3] "said" -> "a"
[ERR:5] "hullo" -> "a"
[ERR:2] "to" -> "a"
[ERR:3] "him" -> "a"
[ERR:2] "in" -> "a"
[ERR:3] "the" -> "a"
[ERR:5] "garden" -> "a"
[ERR:1] "?" -> "a"
Character error rate: 93.17018909899889%. Word accuracy: 1.9250780437044746%.
Character error rate not improved, best so far: 92.77864293659623%
No more improvement for 10 epochs. Training stopped.
*which is not good.
Moreover, when using the --fast
option to load images with LMDB, I also get this error, where images are not detected:
Train NN
Traceback (most recent call last):
File "main.py", line 209, in <module>
main()
File "main.py", line 194, in main
train(model, loader, line_mode=args.line_mode, early_stopping=args.early_stopping)
File "main.py", line 69, in train
batch = loader.get_next()
File "/home/nmonteir/personal/SimpleHTR/src/dataloader_iam.py", line 130, in get_next
imgs = [self._get_img(i) for i in batch_range]
File "/home/nmonteir/personal/SimpleHTR/src/dataloader_iam.py", line 130, in <listcomp>
imgs = [self._get_img(i) for i in batch_range]
File "/home/nmonteir/personal/SimpleHTR/src/dataloader_iam.py", line 120, in _get_img
img = pickle.loads(data)
TypeError: a bytes-like object is required, not 'NoneType'
yes, give it a try. Just checked a training log, it should look roughly like this:
"charErrorRates": [ 0.9838042269187987, 0.8809788654060067, 0.5203559510567297, 0.33205784204671857, 0.29054505005561737, 0.2439599555061179, 0.2181979977753059, 0.20262513904338153, 0.18593993325917688, 0.18740823136818688, ...
So you should get some reasonable readouts after ~10 epochs of training.
Check if the data is correct that is fed to the model, set a breakpoint here and look at the texts and the images of the first few batch elements: https://github.com/githubharald/SimpleHTR/blob/master/src/dataloader_iam.py#L134
I created a breakpoint with breakpoint()
in the get_next()
object. I noticed during the first epoch that the code does not stop in that function, for debug analysis. And I tried leaving in prints of the image variable in there and the other one too, and no printing.
How could I solve this overall?
Get a proper IDE like PyCharm (it's for free), then you can just set a breakpoint, no need to put breakpoint() functions into the code. If print statements do not work then something really weird is going on, maybe the code you changed is not executed at all? Also please make sure you work with the original code from the repo, had it couple of times that people changed the code and then reported bugs.
I am training the SimpleHTR model in the IAM dataset with
python main.py --mode train --data_dir ../data/iam_handwriting_database/ --batch_size 250 --early_stopping 10 --decoder wordbeamsearch
. I don't know why the model only identifies a single character instead of a word in the validation process, as shown below. Should I adjust the python flags of the execution above? I also runned the training without the decoder option and the model only detects a character instead of the word.