emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
MIT License
1.07k stars 258 forks source link

different output at exported model #135

Open kspook opened 5 years ago

kspook commented 5 years ago

@emedvedev, in my case, it doesn't work when I run exported model including Korean recognition. T.T In the exported model, Korean return good results, but number and English Alphabet showed different result.

Test & predict show the good results . Do you have solution?

emedvedev commented 5 years ago

There's not much to go on, so could you maybe provide more information? I may not be able to help, since it's your own fork with significant changes, but maybe someone else would take a look.

Does your Korean fork include the latest master changes? What's the correct fork/branch to look at? How exactly are you exporting the model? Do you have different results for the exact same image with predict and with the same model on the same stage of training when exported? What are the exact results, ideally with full logs?

Unless your issue includes all that, it would be very hard to see where the issue comes from.

kspook commented 5 years ago

thank you for your response.

my information as below; master version : https://github.com/kspook/aocrKR.git checkpoint: https://drive.google.com/open?id=1tPHoE1gK-AUjYBqbSCFbSc6lRnBh7WI5 exported model : https://drive.google.com/open?id=1j--Jb2XlyZXHFwTtWR3NBx2EsDKgyIZ- (https://drive.google.com/open?id=1QPoZQZuLbLmdLq1LGK9saQ4eKMvkOVFA for better result) test : https://github.com/kspook/aocrKR/tree/master/dataset predict : https://github.com/kspook/aocrKR/tree/master/dataset/test-img3/ test curl : https://github.com/kspook/aocrKR/tree/master/aocr36

  1. logic for Korean as below : https://github.com/emedvedev/attention-ocr/issues/126#issuecomment-483470565 I tried to train Korean words again. For example, '신','한' was converted '853863'. that is, 85('신')+3+86('한')+3 FYI, I omitted 1,2,3 for index of all input. Later, I put serial index (0-40) for training of characters in the word If I put 36+3 as index of '신', 37+3 as '한' --> [1, 39, 40, 2 ],

  2. export file : I changed 'stdin' --> arguement.

test for 1.jpeg : in case curl, 1.jpeg showed different result.

(py37) kspook@ml004:~/aocrKR$ python aocr36/main.py predict data/test-img3/1.jpeg 2019-06-08 12:42:18.288443: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-08 12:42:18.308877: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2397220000 Hz 2019-06-08 12:42:18.309086: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5637712d7920 executing computations on platform Host. Devices: 2019-06-08 12:42:18.309162: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-08 12:42:18,309 root INFO phase: predict 2019-06-08 12:42:18,310 root INFO model_dir: checkpoints 2019-06-08 12:42:18,314 root INFO load_model: True 2019-06-08 12:42:18,314 root INFO output_dir: results 2019-06-08 12:42:18,314 root INFO steps_per_checkpoint: 0 2019-06-08 12:42:18,314 root INFO batch_size: 1 2019-06-08 12:42:18,315 root INFO learning_rate: 1.000000 2019-06-08 12:42:18,315 root INFO reg_val: 0 2019-06-08 12:42:18,315 root INFO max_gradient_norm: 5.000000 2019-06-08 12:42:18,315 root INFO clip_gradients: True 2019-06-08 12:42:18,315 root INFO max_image_width 260.000000 2019-06-08 12:42:18,315 root INFO max_prediction_length 18.000000 2019-06-08 12:42:18,316 root INFO channels: 1 2019-06-08 12:42:18,316 root INFO target_embedding_size: 10.000000 2019-06-08 12:42:18,316 root INFO attn_num_hidden: 128 2019-06-08 12:42:18,316 root INFO attn_num_layers: 2 2019-06-08 12:42:18,316 root INFO visualize: False filename, data/test-img3/1.jpeg 2019-06-08 12:42:24,658 root INFO Result: OK. 0.83 L

(py37) kspook@ml004:~/aocrKR$ python aocr36/main.py test data/testingk.tfrecords 2019-06-08 12:45:57.344504: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-08 12:45:57.351414: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2397220000 Hz 2019-06-08 12:45:57.351623: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55e8a7637170 executing computations on platform Host. Devices: 2019-06-08 12:45:57.351705: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-08 12:45:57,352 root INFO phase: test 2019-06-08 12:45:57,352 root INFO model_dir: checkpoints 2019-06-08 12:45:57,353 root INFO load_model: True 2019-06-08 12:45:57,353 root INFO output_dir: results 2019-06-08 12:45:57,353 root INFO steps_per_checkpoint: 0 2019-06-08 12:45:57,353 root INFO batch_size: 1 2019-06-08 12:45:57,353 root INFO learning_rate: 1.000000 2019-06-08 12:45:57,354 root INFO reg_val: 0 2019-06-08 12:45:57,354 root INFO max_gradient_norm: 5.000000 2019-06-08 12:45:57,354 root INFO clip_gradients: True 2019-06-08 12:45:57,354 root INFO max_image_width 260.000000 2019-06-08 12:45:57,358 root INFO max_prediction_length 18.000000 2019-06-08 12:45:57,358 root INFO channels: 1 2019-06-08 12:45:57,358 root INFO target_embedding_size: 10.000000 2019-06-08 12:45:57,358 root INFO attn_num_hidden: 128 2019-06-08 12:45:57,359 root INFO attn_num_layers: 2 2019-06-08 12:45:57,359 root INFO visualize: False

n,c, 5 5 n,c, 55 5 c, c_idx, n, label, label[0], label[1] 3 55 55 (4, '하') 4 하 c, c_idx, n, label, label[0], label[1] 3 55 55 (5, '나') 5 나 c, c_idx, n, label, label[0], label[1] 3 55 55 (6, '우') 6 우 c, c_idx, n, label, label[0], label[1] 3 55 55 (7, '리') 7 리 c, c_idx, n, label, label[0], label[1] 3 55 55 (8, '국') 8 국 c, c_idx, n, label, label[0], label[1] 3 55 55 (9, '민') 9 민 c, c_idx, n, label, label[0], label[1] 3 55 55 (44, '신') 44 신 c, c_idx, n, label, label[0], label[1] 3 55 55 (45, '한') 45 한 c, c_idx, n, label, label[0], label[1] 3 55 55 (46, '기') 46 기 c, c_idx, n, label, label[0], label[1] 3 55 55 (47, '업') 47 업 c, c_idx, n, label, label[0], label[1] 3 55 55 (48, '농') 48 농 c, c_idx, n, label, label[0], label[1] 3 55 55 (49, '협') 49 협 c, c_idx, n, label, label[0], label[1] 3 55 55 (54, '0') 54 0 c, c_idx, n, label, label[0], label[1] 3 55 55 (55, '1') 55 1 1 c, c_idx, n, label, label[0], label[1] 3 55 55 (56, '2') 56 2 c, c_idx, n, label, label[0], label[1] 3 55 55 (57, '3') 57 3 c, c_idx, n, label, label[0], label[1] 3 55 55 (58, '4') 58 4 c, c_idx, n, label, label[0], label[1] 3 55 55 (59, '5') 59 5 c, c_idx, n, label, label[0], label[1] 3 55 55 (64, '6') 64 6 c, c_idx, n, label, label[0], label[1] 3 55 55 (65, '7') 65 7 c, c_idx, n, label, label[0], label[1] 3 55 55 (66, '8') 66 8 c, c_idx, n, label, label[0], label[1] 3 55 55 (67, '9') 67 9 c, c_idx, n, label, label[0], label[1] 3 55 55 (68, 'A') 68 A c, c_idx, n, label, label[0], label[1] 3 55 55 (69, 'B') 69 B c, c_idx, n, label, label[0], label[1] 3 55 55 (74, 'C') 74 C c, c_idx, n, label, label[0], label[1] 3 55 55 (75, 'D') 75 D c, c_idx, n, label, label[0], label[1] 3 55 55 (76, 'E') 76 E c, c_idx, n, label, label[0], label[1] 3 55 55 (77, 'F') 77 F c, c_idx, n, label, label[0], label[1] 3 55 55 (78, 'G') 78 G c, c_idx, n, label, label[0], label[1] 3 55 55 (79, 'H') 79 H c, c_idx, n, label, label[0], label[1] 3 55 55 (84, 'I') 84 I c, c_idx, n, label, label[0], label[1] 3 55 55 (85, 'J') 85 J c, c_idx, n, label, label[0], label[1] 3 55 55 (86, 'K') 86 K c, c_idx, n, label, label[0], label[1] 3 55 55 (87, 'L') 87 L c, c_idx, n, label, label[0], label[1] 3 55 55 (88, 'M') 88 M c, c_idx, n, label, label[0], label[1] 3 55 55 (89, 'N') 89 N c, c_idx, n, label, label[0], label[1] 3 55 55 (94, 'O') 94 O c, c_idx, n, label, label[0], label[1] 3 55 55 (95, 'P') 95 P c, c_idx, n, label, label[0], label[1] 3 55 55 (96, 'Q') 96 Q c, c_idx, n, label, label[0], label[1] 3 55 55 (97, 'R') 97 R c, c_idx, n, label, label[0], label[1] 3 55 55 (98, 'S') 98 S c, c_idx, n, label, label[0], label[1] 3 55 55 (99, 'T') 99 T c, c_idx, n, label, label[0], label[1] 3 55 55 (444, 'U') 444 U c, c_idx, n, label, label[0], label[1] 3 55 55 (445, 'V') 445 V c, c_idx, n, label, label[0], label[1] 3 55 55 (446, 'W') 446 W c, c_idx, n, label, label[0], label[1] 3 55 55 (447, 'X') 447 X c, c_idx, n, label, label[0], label[1] 3 55 55 (448, 'Y') 448 Y c, c_idx, n, label, label[0], label[1] 3 55 55 (449, 'Z') 449 Z c, c_idx, n, label, label[0], label[1] 3 55 55 (454, '-') 454 - 2019-06-08 12:46:02,786 root INFO Step 2 (0.095s). Accuracy: 0.00%, loss: 0.895335, perplexity: 2.44816, probability: 83.24% 0% (L vs 1) ./dataset/test-img3/1.jpeg

(base) kspook@ml004:~$ curl -X POST http://localhost:9002/v1/models/aocr:predict -H 'cache-control: no-cache' -H 'content-type: application/json -d '{ "signature_name": "serving_default", "inputs": { "input": { "b64": "/9j/4AAQSkZJRgABAQAASABIAAD/4QBYRXhpZgAATU0AKgAAAAgAAgESAAMAAAABAAEAAIdpAAQAAAABAAAAJgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAOaADAAQAAAABAAAAHAAAAAD/7QA4UGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAAA4QklNBCUAAAAAABDUHYzZjwCyBOmACZjs+EJ+/8AAEQgAHAA5AwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/bAEMACAYGBwYFCAcHBwkJCAoMFA0MCwsMGRITDxQdGh8eHRocHCAkLicgIiwjHBwoNyksMDE0NDQfJzk9ODI8LjM0Mv/bAEMBCQkJDAsMGA0NGDIhHCEyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMv/dAAQABP/aAAwDAQACEQMRAD8AyYYwgG6rIIAHBqCLHFQeVLHP57SHPmhdmeApOK5bHKaKnjjjtzT1ljjLhnXKj5gD0rMaVxdtCsch2zA7ieADVqK1X7VMxX5y2Dk9QQD/AFq0hWLkc0fmhFbJKhvwNTxMHGQehway9LVUnl9D9xic/KKuW88e+ZVJyGLdO2KZJahZZYhIDgH9Kk2j+8KpWcu7chRhhmIJHByc1fwP8igR/9DKhIwNtI1oHuPM3tsyG8vtkd6bExAFW0Y4rmRx3AwD5gR94hjz3FXLqMJcOP4gFB+oAH9KhjG48+lWLg7rxye8hqhNkSRhQqqoAAwKsKVHtxjIHvUP3nGalQYFMBwOCD/+qpd/1pgUcU/NAH//2Q==" } } }' { "outputs": { "probability": 0.356208, "output": "Q" }

kspook commented 5 years ago

In addition to test 1.jpeg the above posting, I will continue for 'shin' it shows the same result.

  1. for test 2019-06-08 13:17:19,422 root INFO Step 6 (0.095s). Accuracy: 16.67%, loss: 3.153957, perplexity: 23.4286, probability: 91.60% 0% (T vs 신)

  2. predict (py37) kspook@ml004:~/aocrKR$ python aocr36/main.py predict dataset/test-img3/shin.jpeg 2019-06-08 13:18:12.320031: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-06-08 13:18:12.326873: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2397220000 Hz 2019-06-08 13:18:12.327057: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x556c9f033c20 executing computations on platform Host. Devices: 2019-06-08 13:18:12.327136: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-06-08 13:18:12,327 root INFO phase: predict 2019-06-08 13:18:12,328 root INFO model_dir: checkpoints 2019-06-08 13:18:12,328 root INFO load_model: True 2019-06-08 13:18:12,328 root INFO output_dir: results 2019-06-08 13:18:12,328 root INFO steps_per_checkpoint: 0 2019-06-08 13:18:12,328 root INFO batch_size: 1 2019-06-08 13:18:12,328 root INFO learning_rate: 1.000000 2019-06-08 13:18:12,329 root INFO reg_val: 0 2019-06-08 13:18:12,329 root INFO max_gradient_norm: 5.000000 2019-06-08 13:18:12,329 root INFO clip_gradients: True 2019-06-08 13:18:12,329 root INFO max_image_width 260.000000 2019-06-08 13:18:12,329 root INFO max_prediction_length 18.000000 2019-06-08 13:18:12,329 root INFO channels: 1 2019-06-08 13:18:12,329 root INFO target_embedding_size: 10.000000 2019-06-08 13:18:12,330 root INFO attn_num_hidden: 128 2019-06-08 13:18:12,332 root INFO attn_num_layers: 2 2019-06-08 13:18:12,333 root INFO visualize: False filename, dataset/test-img3/shin.jpeg 2019-06-08 13:18:18,207 root INFO Result: OK. 0.92 T

  3. curl. (base) kspook@ml004:~$ curl -X POST \

    http://localhost:9002/v1/models/aocr:predict \ -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -d '{ "signature_name": "serving_default", "inputs": { "input": { "b64": "/9j/4AAQSkZJRgABAQEAeAB4AAD/4QAiRXhpZgAATU0AKgAAAAgAAQESAAMAAAABAAEAAAAAAAD/2wBDAAIBAQIBAQICAgICAgICAwUDAwMDAwYEBAMFBwYHBwcGBwcICQsJCAgKCAcHCg0KCgsMDAwMBwkODw0MDgsMDAz/2wBDAQICAgMDAwYDAwYMCAcIDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz/wAARCAA+AD4DASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD+f+iiigAoooxQAUUFSpwRgjgijHFABRRRQAUDrRQDg0Adb8FPg14i/aA+Lnh3wX4V0+TUvEXii+i0/T7ZASZZXYKOBk4HUnHABPQV+gf/AAV1/wCDdjxN/wAEsP2WPCvxKm8XWniq31DU00bW7ZYvK/s+4lR3gCf31IRgT2YCvVf+DPv9jCP4y/txeIfirqtu0mn/AAt04rYF0BU6hcgopOejLHvxz/FX6d/8F4f2APFX/BYrwjD8N/hd8aPAtjqPwxmfVNc8Ezyia6ur11xbvPIkha3AjdggdNhZm+bg4APzL/4NXP8Agmj8C/8AgodpXxmPxY8Lr4s1PwzLpgsYZrmSEQwz+cTIuwglt0RB9iPWvgv/AILBfAbwn+zH/wAFIfiv4H8DWbab4T0DVvs+m2rOX8iMRplcnk/Pu61+mX/Bqn8PfG37EP8AwV9+MXwR8faRLoXiq38MvbalZzSZ8t7eaCaKSMj5ZElimDowyGjcMDg5r87v+C7GoR6l/wAFZPjdLGwkVfEUsZKnIBAAI+vtQB8jUUUUAFKpww+vcUlFAH6If8Edf+C/viL/AIJF/CXxt4X0jwDo/jL/AISi7jvrN7y+NotlKqFWL7Iy0i9CFyPQEV84Rf8ABSr4waN+2f4g+PvhnxhqHhH4j+IdRm1Ca+02TCHzCB5DxvuSSEJgbZFZTjkGvn6t74Z+CG+JXxI0Dw7HcQ2ja5qNtpyzyHCQmaVI97ew3ZPsKAPvj/gnp+2b8XfhJ8SPiL+2J4n1zUPEmvxxJpc+rarKZbjUmLwiRVwynZGiwxbR+7HnRKUIIFfCvxm+KGtfHb4n+JPHHiCZrzWvFOpT6nfTY4M0zl2+gycAegr+qD9p/wD4Ia/s46Z+xX8Mfhv478YHwD8P/hmVv9Xnhu4LFvEs4Xa0k8j5bBeR24ycsuclQa/In/gvP+1r+yPP8CvCnwJ/Zb8K6TDY+F9VbUdT17T4CsdywQp5ZmfMlwxPzFi2B2FAH5U0UUUAFFFFAADUlrO9rcxyRyNFJGwZXVipQg5BBHIx61HRQB6b8cf2w/id+0rFZp498eeKPFUVjEIoYdQvnkjiUYACrnHYGvNSVx17cd6ZRQAUUUUAf//Z" } } }' { "outputs": { "probability": 0.915982, "output": "T" }

emedvedev commented 5 years ago

I'm going to leave this issue open for a while in case someone wants to jump in, but as I've mentioned, I don't have the capacity to debug forks, unfortunately. Hopefully someone will be able to help or you'll be able to figure it out :)

kspook commented 5 years ago

@emedvedev , for your information, the exported model can't return Korean recognition when I use 'stdin' in predict() at main.py to get image file.

Now I can't get number recognition well when I changed argument style to get image.

kspook commented 5 years ago

@emedvedev, I finally found out the difference b/ saved_model and script is bug of your code.

If I train with the basic settings, it returns the same result b/ the saved model and script. But with the full-ascii training, it returns the different result.

code : https://github.com/kspook/aocr_fullascii.git saved model/ checkpoints : https://drive.google.com/open?id=1FsRonnoMG9cS9npIbob5F2Pgu29JyAzi

image

image