Transpose Op seems useless and lead to low performance

yizhaoyanbo commented 3 years ago

❓Question

I have designed a simple network using tensorflow 1.15, and converted it to mlmodels using tf-coreml and coremltools4.0. I have tested them on iPhoneXS with system version ios13.5.1, and found that the mlmodel converted from coremltools4.0 is much slower than from tfcoreml, about 2x slower.

The original network in tensorflow is as following:

graph = tf.Graph()
with graph.as_default():
    x = tf.placeholder(tf.float32, shape=[1, 1000, 1000, 4], name="input")
    y = tf.layers.conv2d(x, 4, 1, padding='same', activation=tf.nn.relu)
    output_names = [y.op.name]

Converting to mlmodel using tfcoreml

# using tfcoreml
coreml_save_tfcoreml_file = model_dir + "/debug_tfcoreml.mlmodel"
tfcoreml.convert(tf_model_path=frozen_graph_file,
                mlmodel_path=coreml_save_tfcoreml_file,
                output_feature_names=["conv2d/Relu:0"],  # name of the output tensor (appended by ":0")
                input_name_shape_dict={"input": [1, 1000, 1000, 4]},  # input tensor[1, height, width, channel]
                minimum_ios_deployment_target='12')

Converting to mlmodel using coremltools4.0
```
coreml_save_coremltools_file = model_dir + "/debug_coremltools.mlmodel"
mlmodel = ct.convert(frozen_graph_file, source='tensorflow')
mlmodel.save(coreml_save_coremltools_file)
```
I have tried to use coremltool to remove the two transpose layers and change input/output nodes' shape info from HWC to CHW. Then the model's performance from coremltools and tfcoreml become same.

It seems that the two transpose layers are really time-cost and they can not run on ANE or gpu. So why coremltools does not adopt the NCHW format directly?

Futhermore, I also found that the resize_bilinear op in tensorflow is mapped to upsample op in tf-coreml, but mapped to resizeBilinear op in coremltools4.0. However, resizeBilinear op can not run on apple's ANE, but upsample op can run on ANE. Why coremltools does not use the upsample op directly?

Thanks.

System Information

If applicable

macos catalina 10.15 iPhoneXS ios13.5.1 coremltools 4.0

zoucheng1991 commented 2 years ago

i'm also troubled by this problem, have you guys solved it?

aseemw commented 2 years ago

Since coremltools 4, the converter preserves the input / output shapes of the source model, hence the transpose. One simple way to avoid this, is to edit the TF graph definition to use the input/output shape of (4,1000,1000), this will automatically result in a coreml model, with no transposes.

I see that an option in the conversion API will be useful, to automatically drop any transposes in the input/output, and to not strictly enforce the input/output shapes of the source model. I'll mark this issue as a feature request to track it.

lauriebyrum commented 2 years ago

I have a model that is also 2x slower with the new coremltools than tfcoreml. PLEASE ADDRESS THIS! I don't understand the suggestion to change the input shape of the tf graph. We can't just change the placeholder definition, right? We'd need to change all of the TF graph to use CHW...

I think what would be ideal is a way to send us down the same code that switches to CHW if you make an image based model with some flag instead of it being triggered by the input being an image.

devalexqt commented 2 years ago

I also noticed that with transpose layer from HWC to CHW in generated ml model my inference time is 2x slower! Please, add flag in converter to drop HWC->CHW transpose.

Tengxu-Sun commented 2 years ago

I found that the transpose op will make whole custom layer always run on CPU.

apple / coremltools

Transpose Op seems useless and lead to low performance #985

❓Question

System Information