emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
MIT License
1.08k stars 256 forks source link

Korean training (tensorflow serving included) #126

Closed kspook closed 5 years ago

kspook commented 5 years ago

I got below message when I tried 'test'.

I changed several things; 1.'iso-8859-1' to 'utf-8'

  1. add two Korean Character in data_gen.py CHARMAP = ['', '', ''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ신한')
  2. in the bucketdata.py : raise NotImplementedError --> omit and add 3 lines (other wise the program exit with 'NotImplementedError') else:

    raise NotImplementedError

            self.label_list[l_idx] = \
            self.label_list[l_idx][:decoder_input_len]
            target_weights.append([1]*decoder_input_len)

(py36) D:\attention-ocr_b2>python ./aocr/main.py test ./dataset/testing.tfrecords 2019-03-18 12:33:22.445747: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-03-18 12:33:22,458 root INFO phase: test 2019-03-18 12:33:22,458 root INFO model_dir: checkpoints 2019-03-18 12:33:22,458 root INFO load_model: True 2019-03-18 12:33:22,459 root INFO output_dir: results 2019-03-18 12:33:22,459 root INFO steps_per_checkpoint: 0 2019-03-18 12:33:22,459 root INFO batch_size: 1 2019-03-18 12:33:22,459 root INFO learning_rate: 1.000000 2019-03-18 12:33:22,459 root INFO reg_val: 0 2019-03-18 12:33:22,460 root INFO max_gradient_norm: 5.000000 2019-03-18 12:33:22,460 root INFO clip_gradients: True 2019-03-18 12:33:22,460 root INFO max_image_width 160.000000 2019-03-18 12:33:22,460 root INFO max_prediction_length 8.000000 2019-03-18 12:33:22,460 root INFO channels: 1 2019-03-18 12:33:22,460 root INFO target_embedding_size: 10.000000 2019-03-18 12:33:22,461 root INFO attn_num_hidden: 128 2019-03-18 12:33:22,461 root INFO attn_num_layers: 2 2019-03-18 12:33:22,461 root INFO visualize: False 2019-03-18 12:33:24,005 root INFO data_gen.gen() 2019-03-18 12:33:24,225 root INFO Step 1 (0.136s). Accuracy: 0.00%, loss: 4.895189, perplexity: 133.645, probability: 1.03% 0% (85 vs 4) 2019-03-18 12:33:24,243 root INFO Step 2 (0.017s). Accuracy: 0.00%, loss: 12.590834, perplexity: 2.93853e+05, probability: 39.16% 0% (53 vs 2) 2019-03-18 12:33:24,260 root INFO Step 3 (0.016s). Accuracy: 0.00%, loss: 15.508214, perplexity: 5.43415e+06, probability: 98.23% 0% (51 vs 3) 2019-03-18 12:33:24,278 root INFO Step 4 (0.017s). Accuracy: 0.00%, loss: 16.600834, perplexity: 1.62051e+07, probability: 71.18% 0% (49 vs 1)

kspook commented 5 years ago

it's related with #11 but in my case, it is not prediction length.

And I also checked with #52. but I had the same problem

emedvedev commented 5 years ago

Do you have the output from training the model as well? Does it converge on training at all? The error is a bit strange, since testing should (almost) always work if training worked, unless you've modified something between training and testing.

kspook commented 5 years ago

As per raise NotImplementedError (#11), I fixed. The script handles the output data as corrupted data with long digit. So, I changed prediction_length more than 8.

As per utf-8 (#52), I followed the your mention in the every step. So, it looks fine except one thing; in case of number, it product 51, 52 like Unicode number (' ord( )') according to the output above, the numbers was produced correctly and it includes Unicode number( 'ord()'). Now I just have Unicode number. . in case of Korean character, it product 9 digit number. I am in the middle of re-checking.

Thanks a lot.

kspook commented 5 years ago

I think I have Unicode problem I have #52

in the dataset, I changed like below. label=''.join(map(str,label.encode('utf-8'))) feature = {} feature['image'] = _bytes_feature(img)

feature['label'] = _bytes_feature(b(label))

        feature['label'] = _bytes_feature(b(label))

Otherwise, I have below error. T.T Can you help me how to fix?

(1) error msg Traceback (most recent call last): File "./aocr2/main.py", line 285, in main() File "./aocr2/main.py", line 225, in main parameters.save_filename File "D:\attention-ocr_b2\aocr2\util\dataset.py", line 54, in generate feature['label'] = _bytes_feature(b(label)) File "C:\Users\60067527\Anaconda3\envs\py36\lib\site-packages\six.py", line 626, in b return s.encode("latin-1") UnicodeEncodeError: 'latin-1' codec can't encode character '\uc2e0' in position 0: ordinal not in range(256)

(2) test messege

2019-03-20 14:39:19,999 root INFO Step 1 (0.206s). Accuracy: 100.00%, loss: 0.000109, perplexity: 1.00011, probability: 54.81% 100% (51)

2019-03-20 14:39:20,024 root INFO Step 2 (0.019s). Accuracy: 100.00%, loss: 0.000018, perplexity: 1.00002, probability: 91.36% 100% (49)

2019-03-20 14:39:21,647 root INFO Step 3 (0.020s). Accuracy: 70.37%, loss: 4.728099, perplexity: 113.080, probability: 41.93% 11% (51 vs 237149156)

2019-03-20 14:39:21,675 root INFO Step 4 (0.020s). Accuracy: 77.78%, loss: 0.067931, perplexity: 1.07029, probability: 17.18% 100% (50)

2019-03-20 14:39:21,702 root INFO Step 5 (0.020s). Accuracy: 64.44%, loss: 2.101013, perplexity: 8.17445, probability: 1.35% 11% (49 vs 236139160)

emedvedev commented 5 years ago

Ah, that actually makes sense. Please submit a PR once you get the model working—I'm sure quite a lot of people will appreciate it! I'd be happy to merge and help with any modifications, if needed.

emedvedev commented 5 years ago

ord('신') = 4988, but I have 236139160 label.decode('utf-8') = b'236139160' So I can't get final data = '신'

I am stucked here T.T

236/139/160 are the UTF-8 codepoints for 신 in decimal. :)

>>> '신'
'\xec\x8b\xa0'
>>> int('ec', 16)
236
>>> int('8b', 16)
139
>>> int('a0', 16)
160

The fact they're all glued together might be an issue though, but I'm not sure how to address that with minimal changes off the top of my head.

kspook commented 5 years ago

In addition, curl -X POST \ http://localhost:9001/v1/models/yourmodelname:predict -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{ "signature_name": "serving_default", "inputs": { "input": { "b64": "/9j/4AAQ==" } }}'

what should I type for "yourmodelname"?

kspook commented 5 years ago

no response?

As per Korean recognition, https://github.com/emedvedev/attention-ocr/issues/126#issuecomment-474676226 , I put the result below.

As my comment, https://github.com/da03/Attention-OCR/issues/48#issuecomment-473734050, there is the output beyond training character.

To solve this, I made an index and trained at the original one, da03/Attention-OCR#48 So, in your source, I succeeded in character based recognition. But it doesn't support word. (maybe it is possible if a few word get their own index)

**** detail of result (py36) D:\attention-ocr_b34ui>python aocr34/main.py test dataset/testingk.tfrecords 2019-03-27 16:16:11.812149: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-03-27 16:16:11,826 root INFO phase: test 2019-03-27 16:16:11,827 root INFO model_dir: checkpoints 2019-03-27 16:16:11,827 root INFO load_model: True 2019-03-27 16:16:11,828 root INFO output_dir: results 2019-03-27 16:16:11,829 root INFO steps_per_checkpoint: 0 2019-03-27 16:16:11,830 root INFO batch_size: 1 2019-03-27 16:16:11,833 root INFO learning_rate: 1.000000 2019-03-27 16:16:11,834 root INFO reg_val: 0 2019-03-27 16:16:11,834 root INFO max_gradient_norm: 5.000000 2019-03-27 16:16:11,835 root INFO clip_gradients: True 2019-03-27 16:16:11,836 root INFO max_image_width 160.000000 2019-03-27 16:16:11,836 root INFO max_prediction_length 18.000000 2019-03-27 16:16:11,837 root INFO channels: 1 2019-03-27 16:16:11,838 root INFO target_embedding_size: 10.000000 2019-03-27 16:16:11,839 root INFO attn_num_hidden: 128 2019-03-27 16:16:11,841 root INFO attn_num_layers: 2 2019-03-27 16:16:11,842 root INFO visualize: False 2019-03-27 16:16:13,842 root INFO data_gen.gen() word [1 8 5 2] b'52' ( 2019-03-27 16:16:14,132 root INFO Step 1 (0.187s). Accuracy: 0.00%, loss: 3.016383, perplexity: 20.4173, probability: 31.50% 0% (( vs 4) word [ 1 7 12 2] b'49' 1 2019-03-27 16:16:14,156 root INFO Step 2 (0.021s). Accuracy: 50.00%, loss: 0.000025, perplexity: 1.00003, probability: 99.97% 100% (1) word [1 8 3 2] b'50' 2 2019-03-27 16:16:14,180 root INFO Step 3 (0.021s). Accuracy: 66.67%, loss: 0.000059, perplexity: 1.00006, probability: 99.86% 100% (2) word [1 8 4 2] b'51' 3 2019-03-27 16:16:14,204 root INFO Step 4 (0.021s). Accuracy: 75.00%, loss: 0.021416, perplexity: 1.02165, probability: 93.58% 100% (3) word [1 8 7 9 5 3 2] b'54620' 2 2019-03-27 16:16:14,230 root INFO Step 5 (0.021s). Accuracy: 68.00%, loss: 6.107238, perplexity: 449.097, probability: 53.86% 40% (2 vs 한) word [ 1 7 12 11 11 11 2] b'49888' ( 2019-03-27 16:16:14,253 root INFO Step 6 (0.020s). Accuracy: 60.00%, loss: 6.769095, perplexity: 870.524, probability: 72.55% 20% (( vs 신)

emedvedev commented 5 years ago

But it doesn't support word. (maybe it is possible if a few word get their own index)

It's true that this model is mostly optimized for character-based recognition, not word-based. You can, as you said, modify it to give words their own indices instead of characters, although I'm not sure if the performance will be acceptable in that case, and you'll need a massive dataset, too. Doesn't hurt to experiment though. :)

kspook commented 5 years ago

providing index of word is almost impossible. There is no chance if you change the original code, da03/Attention-OCR#48 (comment)? In the original, I succeeded word recognition.

Anyway what do I put "yourmodel" when running your code, https://github.com/emedvedev/attention-ocr/issues/126#issuecomment-476030850

kspook commented 5 years ago

can you answer to me??? please.

In addition, curl -X POST \ http://localhost:9001/v1/models/yourmodelname:predict -H 'cache-control: no-cache' -H 'content-type: application/json' -d '{ "signature_name": "serving_default", "inputs": { "input": { "b64": "/9j/4AAQ==" } }}'

what should I type for "yourmodelname"?

emedvedev commented 5 years ago

You just put aocr.

Please don't post repeated requests: all support in this (and other) open source projects is done on volunteering basis, whenever people have time and capacity to respond. Kindly be prepared to do your own research: in this case, there are other issues in this repository that concern POST requests to the API, and they have correct URLs which you could just look at.

kspook commented 5 years ago

Thank you for comments and your other efforts. It took me so long time to succeed in tensorflow serving with below link. why don't you change README.md file?

https://github.com/emedvedev/attention-ocr/issues/94#issuecomment-435668522

curl -X POST --output - \ http://localhost:9001/v1/models/aocr:predict \ -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -d '{ "signature_name": "serving_default", "inputs": { "input": { "b64": "/9j/4AAQSkZJRgABAQAASABIAAD/4QBYRXhpZgAATU0AKgAAAAgAAgESAAMAAAABAAEAAIdpAAQAAAABAAAAJgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAOaADAAQAAAABAAAAHAAAAAD/7QA4UGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAAA4QklNBCUAAAAAABDUHYzZjwCyBOmACZjs+EJ+/8AAEQgAHAA5AwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/bAEMACAYGBwYFCAcHBwkJCAoMFA0MCwsMGRITDxQdGh8eHRocHCAkLicgIiwjHBwoNyksMDE0NDQfJzk9ODI8LjM0Mv/bAEMBCQkJDAsMGA0NGDIhHCEyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMv/dAAQABP/aAAwDAQACEQMRAD8AyYYwgG6rIIAHBqCLHFQeVLHP57SHPmhdmeApOK5bHKaKnjjjtzT1ljjLhnXKj5gD0rMaVxdtCsch2zA7ieADVqK1X7VMxX5y2Dk9QQD/AFq0hWLkc0fmhFbJKhvwNTxMHGQehway9LVUnl9D9xic/KKuW88e+ZVJyGLdO2KZJahZZYhIDgH9Kk2j+8KpWcu7chRhhmIJHByc1fwP8igR/9DKhIwNtI1oHuPM3tsyG8vtkd6bExAFW0Y4rmRx3AwD5gR94hjz3FXLqMJcOP4gFB+oAH9KhjG48+lWLg7rxye8hqhNkSRhQqqoAAwKsKVHtxjIHvUP3nGalQYFMBwOCD/+qpd/1pgUcU/NAH//2Q==" } } }'

emedvedev commented 5 years ago

Please submit a pull request to README.md with the changes that have worked for you! That would be really great, if you have the time, of course.

On April 3, 2019 at 13:59:53, kspook (notifications@github.com) wrote:

I finally succeeded in tensorflow serving with below link. why don't you change README.md file?

94 (comment)

https://github.com/emedvedev/attention-ocr/issues/94#issuecomment-435668522

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/emedvedev/attention-ocr/issues/126#issuecomment-479458955, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI3QjkMwZf0rX6LZ3eJ9WUvcqIP7KADks5vdJe5gaJpZM4b45SA .

kspook commented 5 years ago

I tried to train Korean words again. For example, '신','한' was converted '853863'. that is, 85('신')+3+86('한')+3 FYI, I omitted 1,2,3 for index of all input. Later, I put serial index (0-40) for training of characters in the word If I put 36+3 as index of '신', 37+3 as '한' --> [1, 39, 40, 2 ], I could trained well.

Finally I succeeded in word recognition. Thank you for sharing your knowledge. ^^

result

one small example : 172 images traininig (py36) D:\attention-ocr_b36uwi>python aocr36 test dataset/testingwk.tfrecords 2019-04-16 09:06:56.586067: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-04-16 09:06:56,598 root INFO phase: test 2019-04-16 09:06:56,598 root INFO model_dir: checkpoints 2019-04-16 09:06:56,598 root INFO load_model: True 2019-04-16 09:06:56,598 root INFO output_dir: results 2019-04-16 09:06:56,599 root INFO steps_per_checkpoint: 0 2019-04-16 09:06:56,599 root INFO batch_size: 1 2019-04-16 09:06:56,599 root INFO learning_rate: 1.000000 2019-04-16 09:06:56,599 root INFO reg_val: 0 2019-04-16 09:06:56,599 root INFO max_gradient_norm: 5.000000 2019-04-16 09:06:56,600 root INFO clip_gradients: True 2019-04-16 09:06:56,600 root INFO max_image_width 160.000000 2019-04-16 09:06:56,600 root INFO max_prediction_length 18.000000 2019-04-16 09:06:56,600 root INFO channels: 1 2019-04-16 09:06:56,600 root INFO target_embedding_size: 10.000000 2019-04-16 09:06:56,600 root INFO attn_num_hidden: 128 2019-04-16 09:06:56,600 root INFO attn_num_layers: 2 2019-04-16 09:06:56,601 root INFO visualize: False 2019-04-16 09:06:58,598 root INFO data_gen.gen() 2019-04-16 09:06:58,914 root INFO Step 1 (0.186s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:58,934 root INFO Step 2 (0.019s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:58,955 root INFO Step 3 (0.020s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:58,975 root INFO Step 4 (0.019s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:58,996 root INFO Step 5 (0.019s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:59,017 root INFO Step 6 (0.020s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:59,040 root INFO Step 7 (0.022s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:59,063 root INFO Step 8 (0.021s). Accuracy: 100.00%, loss: 0.000003, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:59,085 root INFO Step 9 (0.021s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2019-04-16 09:06:59,108 root INFO Step 10 (0.020s). Accuracy: 100.00%, loss: 0.000002, perplexity: 1.00000, probability: 100.00% 100% (신한) 2.FYI, one bad result - 10 images training 2019-04-15 16:16:31.221433: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-04-15 16:16:31,234 root INFO phase: test 2019-04-15 16:16:31,234 root INFO model_dir: checkpoints 2019-04-15 16:16:31,235 root INFO load_model: True 2019-04-15 16:16:31,235 root INFO output_dir: results 2019-04-15 16:16:31,235 root INFO steps_per_checkpoint: 0 2019-04-15 16:16:31,235 root INFO batch_size: 1 2019-04-15 16:16:31,236 root INFO learning_rate: 1.000000 2019-04-15 16:16:31,236 root INFO reg_val: 0 2019-04-15 16:16:31,236 root INFO max_gradient_norm: 5.000000 2019-04-15 16:16:31,236 root INFO clip_gradients: True 2019-04-15 16:16:31,236 root INFO max_image_width 160.000000 2019-04-15 16:16:31,236 root INFO max_prediction_length 18.000000 2019-04-15 16:16:31,237 root INFO channels: 1 2019-04-15 16:16:31,237 root INFO target_embedding_size: 10.000000 2019-04-15 16:16:31,237 root INFO attn_num_hidden: 128 2019-04-15 16:16:31,237 root INFO attn_num_layers: 2 2019-04-15 16:16:31,237 root INFO visualize: False 2019-04-15 16:16:33,224 root INFO data_gen.gen() step , [1.223667, b'\xed\x95\x9c', 0.568962602180392] test output ground 한 853863

label_list [(4, '0'), (5, '1'), (6, '2'), (7, '3'), (8, '4'), (9, '5'), (40, '6'), (44, '7'), (45, '8'), (46, '9'), (47, 'A'), (48, 'B'), (49, 'C'), (50, 'D'), (54, 'E'), (55, 'F'), (56, 'G'), (57, 'H'), (58, 'I'), (59, 'J'), (60, 'K'), (64, 'L'), (65, 'M'), (66, 'N'), (67, 'O'), (68, 'P'), (69, 'Q'), (70, 'R'), (74, 'S'), (75, 'T'), (76, 'U'), (77, 'V'), (78, 'W'), (79, 'X'), (80, 'Y'), (84, 'Z'), (85, '신'), (86, '한')]

c lex, 8 853863 revert n=n+c 8 c lex, 5 853863 revert n=n+c 85 c lex, 3 853863 c lex, 8 853863 revert n=n+c 8 c lex, 6 853863 revert n=n+c 86 c lex, 3 853863

revert() for ground n l_new label[0], label, 신한 86 (86, '한')

output : 한

2019-04-15 16:16:33,517 root INFO Step 1 (0.188s). Accuracy: 50.00%, loss: 1.223667, perplexity: 3.39963, probability: 56.90% 50% (한 vs 신한) ./dataset/word-img/hangul-images/hangul_521.jpeg

emedvedev commented 5 years ago

Awesome, glad it's working for you! I'm going to close the issue since you've succeeded, but please feel free to open another one (or write right here) if you encounter other problems. Happy to help!