githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.96k stars 885 forks source link

Training the model from scratch and error "model not found" #168

Closed nunomrm closed 9 months ago

nunomrm commented 10 months ago

I am trying to train this HTR model from scratch with IAM data. I run into an error when I run this: python main.py --data_dir ../data/iam_handwriting_database/ --batch_size 250 --early_stopping 10

The error is:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    main()
  File "main.py", line 204, in main
    model = Model(char_list_from_file(), decoder_type, must_restore=True, dump=args.dump)
  File "/home/nmonteir/personal/digi-lists/SimpleHTR/src/model.py", line 54, in __init__
    self.sess, self.saver = self.setup_tf()
  File "/home/nmonteir/personal/digi-lists/SimpleHTR/src/model.py", line 161, in setup_tf
    raise Exception('No saved model found in: ' + model_dir)
Exception: No saved model found in: ../model/

This does not make sense to me as in the instructions on "Train model on IAM dataset" on the main README.md say that I should "Delete files from model directory if you want to train from scratch".

I am using the python modules in requirements.txt, and running with Tensorflow on CPUs. I can confirm the IAM data is properly setup.

Python and GCC info:

Python: 3.8.18 (default, Sep 11 2023, 13:40:15) 
[GCC 11.2.0]
githubharald commented 10 months ago

Hi, the command shown in the README is different from yours, you are missing the --mode train option.

nunomrm commented 10 months ago

Thank you, you're right, the model is now training. When I removed the "--fast" flag i removed accidentally the train mode.

MohammedZuhairAhmed commented 9 months ago

Hello Harald,

I was looking to train the model from scratch and tried to download the IAM dataset which is mentioned in your readme file ,website link:- http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

But the website you have mentioned is not opening, So i checked in kaggle it has dataset but contains only words.tgz which is a png dataset.

The another one ASCII/words.txt is not in kaggle.

Can you please tell from where shall I download the dataset.

githubharald commented 9 months ago

I think you can only get it from their website. Are you maybe blocked, and if so, try with a VPN.