da03 / Attention-OCR

Visual Attention based OCR
MIT License
1.12k stars 363 forks source link

Update to support TensorFlow 1.2.0 #47

Open emedvedev opened 7 years ago

emedvedev commented 7 years ago

TensorFlow API has been changed again, updating the code to reflect the recent changes.

arpitkh96 commented 7 years ago

Old weights do not work anymore due to version change. Can you upload weights of new model if you have trained it. I don't have such powerful machine

emedvedev commented 7 years ago

I'm using another dataset to train the model, so can't help here, unfortunately, but my suggestion would be to use Google's ML Engine: https://cloud.google.com/ml-engine/. They give $300 in credits for trial run, which is more than enough for training (mine took around $20 with the BASIC_GPU instance, and the dataset was way bigger than the one in the example here).

emedvedev commented 7 years ago

Another suggestion, if you decide to go the Cloud ML route, would be converting your dataset into one large TFRecords file instead of thousands of individual small images. Otherwise I/O will become a very critical bottleneck for you.

Here's a gist on how to generate the TFRecords file: https://gist.github.com/emedvedev/dd056666337b54c13176da93d5b987b7 You'll also have to modify src/data_util/data_gen.py to read from this file though, so it might be too much work (I did have to make quite a lot of changes to the tooling around this model in my fork), but it does make training significantly faster.

arpitkh96 commented 7 years ago

That can be done. Do you have any idea how much time will it require approx?(with BASIC_GPU)

emedvedev commented 7 years ago

A couple hours to a couple days depending on how comfortable you are with tensorflow. :) I'll update my fork today and document all the changes, so maybe you will be able to just use it without changing the code too much.

arpitkh96 commented 7 years ago

i was talking about training time. Well if your work is going to save me time of data conversion coding, i am waiting desperately ;-)

emedvedev commented 7 years ago

@arpitkh96 I've moved my fork to https://github.com/emedvedev/attention-ocr and changed the interface quite a bit. It's also bundled into a package now.

Didn't have the time to update the README and clean up, but here's the brief instructions:

  1. If you want CLI, clone the repo and run pip install . inside. Then you'll be able to use the aocr ... command. Otherwise you can do python -m aocr inside the repo dir instead.

  2. For dataset generation, you'll need a .txt file with annotations in the format of image/path.png yourimagetext as described in README. To merge the files and the annotations into a .tfrecords file you can use aocr dataset:

aocr dataset datasets/annotations-training.txt datasets/training.tfrecords
aocr dataset datasets/annotations-training.txt datasets/testing.tfrecords
  1. For training, use aocr train:
aocr train datasets/training.tfrecords
  1. For testing, use aocr test:
aocr test datasets/testing.tfrecords

The fork is still a little screwed up (the logs don't make too much sense during the training stage, for instance), but the basics are working, and converting to tfrecords makes it much faster.