DeepLab 257x257 is great but 2049x2049 is even better

samhodge commented 5 years ago

Do you know how to produce a TFLite file of any arbitary dimension from the deeplab models here:

https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md

I got pretty close.

I have some test code

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="deeplabv3_257_mv_gpu.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Which executes flawlessly

but for my own model that I have converted

(tensorflow-v1.13.1) [samh@apollo-centos6 tmp]$ more speedy.py 
import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="speedy.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.uint8)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

(tensorflow-v1.13.1) [samh@apollo-centos6 tmp]$ python speedy.py 
Traceback (most recent call last):
  File "speedy.py", line 17, in <module>
    interpreter.invoke()
  File "/home/samh/anaconda3/envs/tensorflow-v1.13.1/lib/python3.6/site-packages/tensorflow/lite/python/interpreter.py", line 277, in invoke
    self._interpreter.Invoke()
  File "/home/samh/anaconda3/envs/tensorflow-v1.13.1/lib/python3.6/site-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 109, in Invoke
    return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_Invoke(self)
RuntimeError: tensorflow/lite/kernels/depthwise_conv.cc:99 params->depth_multiplier * SizeOfDimension(input, 3) != SizeOfDimension(filter, 3) (0 != 64)Node number 33 (DEPTHWISE_CONV_2D) failed to prepare.

It is erroring on the line .invoke()

the .pb file was created using the export_model.py script here:

https://github.com/tensorflow/models/blob/master/research/deeplab/export_model.py

Using the docs here https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/export_model.md

It is an xception_65 model

I quantised as follows

tflite_convert --output_file=speedy.tflite --graph_def_file=frozen_graph.pb --inference_type=FLOAT --inference_input_type=QUANTIZED_UINT8 --input_arrays=ImageTensor --input_shapes=1,2049,2049,3 --output_arrays='SemanticPredictions' --std_dev_values=128 --mean_values=127

Which ends cleanly.

Now I know that this will take a while to run on a mobile phone but the end game is to run it on a GPU in OpenGL ES on Linux and Metal on Apple desktop.

Do you have any hints

To repeat here is the error message

  File "/home/samh/anaconda3/envs/tensorflow-v1.13.1/lib/python3.6/site-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 109, in Invoke
    return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_Invoke(self)
RuntimeError: tensorflow/lite/kernels/depthwise_conv.cc:99 params->depth_multiplier * SizeOfDimension(input, 3) != SizeOfDimension(filter, 3) (0 != 64)Node number 33 (DEPTHWISE_CONV_2D) failed to prepare.

samhodge commented 5 years ago

Found AN answer on a plate https://github.com/intel/webml-polyfill/tree/master/examples/semantic_segmentation/model

normandra commented 4 years ago

@samhodge mind sharing whats your inference speed when using xception with that kind of resolution?

samhodge commented 4 years ago

About 0.2 fps with a NVIDIA GTX 1060, some of that time is post processing of the semantic mask from INT8 to antialiased float.

dailystudio / ml

DeepLab 257x257 is great but 2049x2049 is even better #8