emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
MIT License
1.08k stars 256 forks source link

Input Shape #174

Closed mokiya closed 4 years ago

mokiya commented 4 years ago

Hi

I would like to know input shape information.

I trained data using sample below without any specific configuration. wget http://www.cs.cmu.edu/~yuntiand/sample.tgz Which means, I just run aocr train ./datasets/training.tfrecords for training.

Testing (aocr test ./datasets/testing.tfrecords) is working as intended, and exporting (aocr export --format=frozengraph ./exported-model) is also no issue.

By the way, I run summarize_graph tools to know input shape. but as you can see below, there is no input shape information.

$ sudo bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
>      --in_graph=../pretrained-data/exported-model/frozen_graph.pb
Found 1 possible inputs: (name=input_image_as_bytes, type=string(7), shape=<unknown>) 
No variables spotted.
Found 2 possible outputs: (name=prediction, op=Identity) (name=probability, op=Identity) 
Found 7876747 (7.88M) const parameters, 0 (0) variable parameters, and 160 control_edges
21 nodes assigned to device '/device:CPU:0'2435 nodes assigned to device '/device:GPU:0'Op types used: 703 Const, 238 Mul, 192 Sigmoid, 184 ConcatV2, 138 Tanh, 130 Add, 129 Split, 94 MatMul, 94 BiasAdd, 76 Identity, 54 Reshape, 41 Switch, 39 Enter, 29 Pack, 26 Squeeze, 23 Tile, 21 Merge, 20 Sum, 20 Softmax, 20 Slice, 19 StridedSlice, 19 ArgMax, 18 AddV2, 15 NextIteration, 15 Shape, 11 GatherV2, 10 Less, 10 Max, 9 Cast, 9 Range, 8 TensorArrayV3, 8 Conv2D, 7 Relu, 6 ExpandDims, 5 LogicalAnd, 5 TensorArrayScatterV3, 5 TensorArrayReadV3, 5 Sub, 5 Exit, 5 LoopCond, 5 MaxPool, 4 Assert, 4 GreaterEqual, 3 Transpose, 3 FusedBatchNormV3, 3 TensorArrayWriteV3, 3 TensorArraySizeV3, 3 Fill, 3 TensorArrayGatherV3, 3 Equal, 2 RealDiv, 2 All, 2 Ceil, 2 Greater, 2 ResizeBicubic, 1 DecodePng, 1 MutableHashTableV2, 1 Size, 1 LookupTableInsertV2, 1 LookupTableFindV2, 1 Pad, 1 LessEqual, 1 Placeholder, 1 Rank, 1 Unpack
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- --graph=../pretrained-data/exported-model/frozen_graph.pb --show_flops --input_layer=input_image_as_bytes --input_layer_type=string --input_layer_shape= --output_layer=prediction,probability

I just wonder input information for training such as width, height and channel. My understanding is width=160, height=60 and channel=1. Please correct me, if I misunderstanding something.

Thanks

maxpaynestory commented 4 years ago

input is input_image_as_bytes which is not a shape, it takes bytes

for example

image = cv2.imread("myimage.png", 1)

then convert it onto bytes

is_success, im_buf_arr = cv2.imencode(".png", image)

byte_im = im_buf_arr.tobytes()

feed it to input tensor input_image_as_bytes

sess3.run([prediction_result], feed_dict={input_image_as_bytes: byte_im})