XiaoMi / mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Apache License 2.0
4.94k stars 819 forks source link

Segmentation fault for DepthwiseConv2d INT8 (CAFFE) #590

Closed gasgallo closed 4 years ago

gasgallo commented 4 years ago

Before you open an issue, please make sure you have tried the following steps:

  1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
  2. Have you ever read the document for your usage?
  3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
  4. The form below must be filled.

System information

Model deploy file (*.yml)

# The name of library
library_name: model
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
  sp: # model tag, which will be used in model loading and must be specific.
    platform: caffe
    # path to your tensorflow model's pb file. Support local path, http:// and https://
    model_file_path: /models/sp/model-nofc.prototxt
    weight_file_path: /models/sp/model-nofc.caffemodel
    # sha256_checksum of your model's pb file.
    # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
    model_sha256_checksum: 54479f5ec821884f5bfcc03cb1f4558275541c6e80d9f33f65cc58562fffe91b 
    weight_sha256_checksum: e9599be0e9d5a5f08b85f9b98d2a76b55463ecb6820efc3bcdbc3ea0050f62a0 
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          - 1,3,112,112
        input_data_formats:
          - NCHW
        output_tensors:
          - fc1bn
        output_shapes:
          - 1,1,1,512
    obfuscate: 0
    quantize: 1
    quantize_range_file: /mace/overall_range
    runtime: cpu # cpu, gpu or cpu+gpu or dsp
    winograd: 0

Describe the problem

Segmentation fault happens when running quantized depthwise conv2d.

To Reproduce

Steps to reproduce the problem:

1. cd /path/to/mace
2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
2. python tools/converter.py run --config_file=/path/to/your/model_deployment_file

Error information / logs

Please include the full log and/or traceback here. https://gist.github.com/gasgallo/619eb23800d7caf46e6e97ed23bfc38a

Additional context

Models runs fine w/o quantization.

lu229 commented 4 years ago

@gasgallo We could not find the cause of the error in the log, Please reference this debug-with-crash and get the crash stacks.

gasgallo commented 4 years ago

@lu229

lu229 commented 4 years ago

@gasgallo The stacks has no detail info, please reference debug-with-crash and set the correct symbol path for the ndk-stack, then the ndk-stack will print the stack's detail info.

gasgallo commented 4 years ago

@lu229 what's the correct path? I'm currently setting the folder where MACE copies all it's files:

adb logcat | $ANDROID_NDK_HOME/ndk-stack -sym /data/local/tmp/mace_run

but there's no detailed info anyway:

********** Crash dump: **********
Build fingerprint: 'Xiaomi/cepheus/cepheus:10/QKQ1.190825.002/V11.0.3.0.QFAMIXM:user/release-keys'
pid: 8085, tid: 8085, name: mace_run_static  >>> /data/local/tmp/mace_run/mace_run_static <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7a6fe00000
Stack frame #00 pc 000000000006f880  /apex/com.android.runtime/lib64/bionic/libc.so (memset+256) (BuildId: 084c8a81b8c78e19cd9a1ff6208e77cf)
Stack frame #01 pc 000000000006c168  /data/local/tmp/mace_run/mace_run_static
Stack frame #02 pc 00000000001362e0  /data/local/tmp/mace_run/mace_run_static
Stack frame #03 pc 000000000003e854  /data/local/tmp/mace_run/mace_run_static
Stack frame #04 pc 000000000001c848  /data/local/tmp/mace_run/mace_run_static
Stack frame #05 pc 000000000001ee0c  /data/local/tmp/mace_run/mace_run_static
Stack frame #06 pc 000000000001f970  /data/local/tmp/mace_run/mace_run_static
Stack frame #07 pc 000000000006ebc4  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 084c8a81b8c78e19cd9a1ff6208e77cf)
Crash dump is completed
gasgallo commented 4 years ago

If I build the model and run with target armeabi-v7a the log is different:

Generate input file:  builds/model/_tmp/sp/8e5e6edf84635516eaabb3f63c6e7dbe/MI9_msmnile/armeabi-v7a/model_input_data
Generate input file done.
* Run 'sp' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,)
Push builds/model/_tmp/sp/8e5e6edf84635516eaabb3f63c6e7dbe/MI9_msmnile/armeabi-v7a/model_input_data to /data/local/tmp/mace_run
Push builds/model/model/sp.data to /data/local/tmp/mace_run
Push builds/model/model/sp.pb to /data/local/tmp/mace_run/sp.pb
Push builds/model/_tmp/armeabi-v7a/mace_run_static to /data/local/tmp/mace_run
Push /tmp/cmd_file-sp-1579158073.53 to /data/local/tmp/mace_run/cmd_file-sp-1579158073.53
I mace/tools/validation/mace_run.cc:451] model name: sp
I mace/tools/validation/mace_run.cc:452] mace version: v0.11.0-rc0-0-g2d650b6
I mace/tools/validation/mace_run.cc:453] input node: data
I mace/tools/validation/mace_run.cc:454] input shape: 1,3,112,112
I mace/tools/validation/mace_run.cc:455] output node: fc1bn
I mace/tools/validation/mace_run.cc:456] output shape: 1,512,1,1
I mace/tools/validation/mace_run.cc:457] input_file: /data/local/tmp/mace_run/model_input
I mace/tools/validation/mace_run.cc:458] output_file: /data/local/tmp/mace_run/model_out
I mace/tools/validation/mace_run.cc:459] model_data_file: /data/local/tmp/mace_run/sp.data
I mace/tools/validation/mace_run.cc:460] model_file: /data/local/tmp/mace_run/sp.pb
I mace/tools/validation/mace_run.cc:461] device: CPU
I mace/tools/validation/mace_run.cc:462] round: 1
I mace/tools/validation/mace_run.cc:463] restart_round: 1
I mace/tools/validation/mace_run.cc:464] gpu_perf_hint: 3
I mace/tools/validation/mace_run.cc:465] gpu_priority_hint: 3
I mace/tools/validation/mace_run.cc:466] omp_num_threads: -1
I mace/tools/validation/mace_run.cc:467] cpu_affinity_policy: 1
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/libmace/mace.cc:603] Destroying MaceEngine
I mace/tools/validation/mace_run.cc:508] restart round 0
I mace/libmace/mace.cc:876] Create MaceEngine from model graph proto and weights data
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/tools/validation/mace_run.cc:265] Create Mace Engine latency: 15.579 ms
I mace/tools/validation/mace_run.cc:272] Total init latency: 15.693 ms
I mace/tools/validation/mace_run.cc:313] Warm up run
F mace/ops/depthwise_conv2d.cc:218] Check failed: filter->dim(2) == input->dim(3) 7 != 1024
F mace/ops/depthwise_conv2d.cc:218] backtrace:
F mace/ops/depthwise_conv2d.cc:218]  pc 0xba6a08
F mace/ops/depthwise_conv2d.cc:218]  pc 0xba6690
F mace/ops/depthwise_conv2d.cc:218]  pc 0xbab428
F mace/ops/depthwise_conv2d.cc:218]  pc 0xbab384
F mace/ops/depthwise_conv2d.cc:218]  pc 0xbab6d4
F mace/ops/depthwise_conv2d.cc:218]  pc 0xbab780
F mace/ops/depthwise_conv2d.cc:218]  pc 0x9ab580
F mace/ops/depthwise_conv2d.cc:218]  pc 0xb00708
F mace/ops/depthwise_conv2d.cc:218]  pc 0x9388a4
F mace/ops/depthwise_conv2d.cc:218]  pc 0x9394bc
F mace/ops/depthwise_conv2d.cc:218]  pc 0x8f7368
F mace/ops/depthwise_conv2d.cc:218]  pc 0x8fa570
F mace/ops/depthwise_conv2d.cc:218]  pc 0x8faf30
F mace/ops/depthwise_conv2d.cc:218]  pc 0xf26143a8 __libc_init
Aborted
ERROR: [Mace Run] /mace/tools/device.py:358: Mace run failed.

Might be that when target is arm64-v8a this error isn't caught?

Moreover if I try to print the shapes of filter (1,1024,7,7) and input (1,7,7,1024), there's actually a mismatch. Is this a bug in depthwise conv? But filter shape is supposed to be (7,7,1024,1), am I right?

lu229 commented 4 years ago

@gasgallo the command should be: adb logcat | $ANDROID_NDK_HOME/ndk-stack -sym builds/model/_tmp/arm64-v8a/

lu229 commented 4 years ago

@gasgallo I don't think the error on arm64-v8a is the same as the armeabi-v7a. the filter's shape format should be OIHW, please reference: CPU runtime memory layout

lee-bin commented 4 years ago

@gasgallo It seems like a data layout issue. We have not tried to quantize a caffe model before. Maybe you can set input_data_formats and output_data_formats to NHWC.

gasgallo commented 4 years ago

@gasgallo the command should be: adb logcat | $ANDROID_NDK_HOME/ndk-stack -sym builds/model/_tmp/arm64-v8a/

@lu229 Thanks, the following is the output of the command you suggested:

********** Crash dump: **********
Build fingerprint: 'Xiaomi/cepheus/cepheus:10/QKQ1.190825.002/V11.0.3.0.QFAMIXM:user/release-keys'
pid: 26491, tid: 26491, name: mace_run_static  >>> /data/local/tmp/mace_run/mace_run_static <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x71f3c00000
Stack frame #00 pc 000000000006f880  /apex/com.android.runtime/lib64/bionic/libc.so (memset+256) (BuildId: 084c8a81b8c78e19cd9a1ff6208e77cf)
Stack frame #01 pc 000000000006a824  /data/local/tmp/mace_run/mace_run_static: Routine mace::Buffer::Clear(long) at ??:?
Stack frame #02 pc 00000000000c7cb8  /data/local/tmp/mace_run/mace_run_static: Routine mace::Tensor::Clear() at ??:?
Stack frame #03 pc 00000000000cbd48  /data/local/tmp/mace_run/mace_run_static: Routine mace::ops::DepthwiseConv2dOp<(mace::DeviceType)0, unsigned char>::Run(mace::OpContext*) at ??:?
Stack frame #04 pc 00000000001ebd08  /data/local/tmp/mace_run/mace_run_static: Routine mace::SerialNet::Run(mace::RunMetadata*) at ??:?
Stack frame #05 pc 00000000000677c0  /data/local/tmp/mace_run/mace_run_static: Routine mace::MaceEngine::Impl::Run(std::map<std::string, mace::MaceTensor, std::less<std::string>, std::allocator<std::pair<std::string const, mace::MaceTensor> > > const&, std::map<std::string, mace::MaceTensor, std::less<std::string>, std::allocator<std::pair<std::string const, mace::MaceTensor> > >*, mace::RunMetadata*) at ??:?
Stack frame #06 pc 00000000000682a4  /data/local/tmp/mace_run/mace_run_static: Routine mace::MaceEngine::Run(std::map<std::string, mace::MaceTensor, std::less<std::string>, std::allocator<std::pair<std::string const, mace::MaceTensor> > > const&, std::map<std::string, mace::MaceTensor, std::less<std::string>, std::allocator<std::pair<std::string const, mace::MaceTensor> > >*) at ??:?
Stack frame #07 pc 000000000002a5fc  /data/local/tmp/mace_run/mace_run_static: Routine mace::tools::validation::RunModel(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::vector<long, std::allocator<long> >, std::allocator<std::vector<long, std::allocator<long> > > > const&, std::vector<mace::DataFormat, std::allocator<mace::DataFormat> > const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::vector<long, std::allocator<long> >, std::allocator<std::vector<long, std::allocator<long> > > > const&, std::vector<mace::DataFormat, std::allocator<mace::DataFormat> > const&, float) at ??:?
Stack frame #08 pc 000000000002d264  /data/local/tmp/mace_run/mace_run_static: Routine mace::tools::validation::Main(int, char**) at ??:?
Stack frame #09 pc 000000000002da60  /data/local/tmp/mace_run/mace_run_static: Routine main at ??:?
Stack frame #10 pc 000000000006ebc4  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 084c8a81b8c78e19cd9a1ff6208e77cf)

line numbers are still missing but at least there's some more information.

lu229 commented 4 years ago

@gasgallo as @lee-bin said, we haven't quantized a caffe model before, perhaps you can set input_data_formats and output_data_formats to NHWC and try again. If failed again, please attach your model and yml file, I will debug and find the error's cause.

gasgallo commented 4 years ago

@lu229 @lee-bin I've also tried, as you suggested, to play around with data layouts, but it doesn't seem to help. I get the same error with target armeabi-v7a and same segmentation fault with target arm64-v8a. You can get my model and quantization stats from here. The yaml file in the first post will work fine.

Thank you!!

yejw5 commented 4 years ago

@gasgallo No permission to access the model file link.

gasgallo commented 4 years ago

@yejw5 try now

yejw5 commented 4 years ago

@gasgallo The overall_range.txt downloaded doesn't contain message for op pre_fc1? It crashed in convert stage:

Add quantize tensor range
File "tools/python/convert.py", line 279, in <module>
    convert(conf, flags.output)
File "tools/python/convert.py", line 75, in convert
    mace_model = convert_model(model_conf)
File "tools/python/convert.py", line 184, in convert_model
    output_graph_def, quantize_activation_info = mace_transformer.run()
File "/data/deeplearning/framework/mace/tools/python/transform/transformer.py", line 139, in run
    changed = transformer()
File "/data/deeplearning/framework/mace/tools/python/transform/transformer.py", line 1858, in add_quantize_tensor_range
    % op)
File "/data/deeplearning/framework/mace/tools/python/utils/util.py", line 76, in mace_check
    for line in traceback.format_stack():
ERROR: /data/deeplearning/framework/mace/tools/python/transform/transformer.py:1858: input: "conv_6dw7_7_conv2d"
input: "pre_fc1_filter"
input: "fc1bn_offset"
output: "pre_fc1"
name: "pre_fc1"
type: "Conv2D"
arg {
  name: "T"
  i: 1
}
arg {
  name: "framework_type"
  i: 1
}
arg {
  name: "data_format"
  i: 1000
}
arg {
  name: "strides"
  ints: 1
  ints: 1
}
arg {
  name: "padding_values"
  ints: 0
  ints: 0
}
output_shape {
  dims: 1
  dims: 1
  dims: 1
  dims: 512
}
 does not have quantize activation info
gasgallo commented 4 years ago

@yejw5 sorry, use the following yaml file.

# The name of library
library_name: model
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
  sp: # model tag, which will be used in model loading and must be specific.
    platform: caffe
    # path to your tensorflow model's pb file. Support local path, http:// and https://
    model_file_path: /models/sp/model-nofc.prototxt
    weight_file_path: /models/sp/model-nofc.caffemodel
    # sha256_checksum of your model's pb file.
    # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
    model_sha256_checksum: 54479f5ec821884f5bfcc03cb1f4558275541c6e80d9f33f65cc58562fffe91b 
    weight_sha256_checksum: e9599be0e9d5a5f08b85f9b98d2a76b55463ecb6820efc3bcdbc3ea0050f62a0 
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          - 1,3,112,112
        input_data_formats:
          - NCHW
        output_tensors:
          - fc1bn
        output_shapes:
          - 1,1,1,512
    obfuscate: 0
    quantize: 1
    quantize_range_file: /mace/overall_range
    runtime: cpu # cpu, gpu or cpu+gpu or dsp
    winograd: 0

the output node is different and pre_fc1 is fused with fc1bn.

yejw5 commented 4 years ago

@gasgallo It's caused by filter format of depthwise conv in caffe. You can use this patch to fix it (apply on newest master code):

diff --git a/tools/python/transform/transformer.py b/tools/python/transform/transformer.py
index 69411e4..b3df498 100644
--- a/tools/python/transform/transformer.py
+++ b/tools/python/transform/transformer.py
@@ -1116,6 +1116,17 @@ class Transformer(base_converter.ConverterInterface):
                     filter.float_data[:] = filter_data.flat
                     filter.dims[:] = filter_data.shape
                     transposed_filter.add(op.input[1])
+                elif ConverterUtil.get_arg(
+                        op, MaceKeyword.mace_framework_type_str).i == \
+                                FrameworkType.CAFFE.value and \
+                                op.type == MaceOp.DepthwiseConv2d.name:
+                    filter = self._consts[op.input[1]]
+                    filter_data = np.array(filter.float_data).reshape(
+                        filter.dims)
+                    filter_data = filter_data.transpose(2, 3, 1, 0)
+                    filter.float_data[:] = filter_data.flat
+                    filter.dims[:] = filter_data.shape
+                    transposed_filter.add(op.input[1])
             # deconv's filter's output channel and input channel is reversed
             for op in net.op:
                 if op.type == MaceOp.Deconv2D.name and \
gasgallo commented 4 years ago

@yejw5 It works, thank you!

Even though performance of quantized model are a bit disappointing. Seeing the benchmark of inception v3 model on pocophone here:

I was hoping for a similar result for my model, but performance are as follows:

@lee-bin @lu229 Any comment on above results?

yejw5 commented 4 years ago

@gasgallo Running your model on Mix2s (soc: sdm845) with 10 rounds got 65.522 avg ms. It has been smaller and faster than inception v3. The speed gap maybe primarily caused by hardware. Or you can reduce the complexity of model graph.

yejw5 commented 4 years ago

Besides, lee-bin and lu229 are both on vocation. Maybe you can consult them after the Spring Festival.

gasgallo commented 4 years ago

@gasgallo Running your model on Mix2s (soc: sdm845) with 10 rounds got 65.522 avg ms. It has been smaller and faster than inception v3. The speed gap maybe primarily caused by hardware. Or you can reduce the complexity of model graph.

@yejw5 I'm not referring to the absolute inference time, but to the fact that there's no improvement using the quantized model on rk3399 and qcs605. On those hardware, the quantized model runs at the same speed as the not-quantized model.

yejw5 commented 4 years ago

Sorry for misreading.

It may be some unknown reasons. We need analyze op by op...

gasgallo commented 4 years ago

@yejw5 here's the benchmark with op stats:

***************************************************
          Benchmark model sp on msmnile          
***************************************************
I mace/benchmark/benchmark_model.cc:202] Model name: [sp]
I mace/benchmark/benchmark_model.cc:203] Model_file: 
I mace/benchmark/benchmark_model.cc:204] Device: [CPU]
I mace/benchmark/benchmark_model.cc:205] gpu_perf_hint: [3]
I mace/benchmark/benchmark_model.cc:206] gpu_priority_hint: [3]
I mace/benchmark/benchmark_model.cc:207] omp_num_threads: [-1]
I mace/benchmark/benchmark_model.cc:208] cpu_affinity_policy: [1]
I mace/benchmark/benchmark_model.cc:209] Input node: [data]
I mace/benchmark/benchmark_model.cc:210] Input shapes: [1,3,112,112]
I mace/benchmark/benchmark_model.cc:211] Output node: [fc1bn]
I mace/benchmark/benchmark_model.cc:212] output shapes: [1,1,1,512]
I mace/benchmark/benchmark_model.cc:213] Warmup runs: [1]
I mace/benchmark/benchmark_model.cc:214] Num runs: [100]
I mace/benchmark/benchmark_model.cc:215] Max run seconds: [10]
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ---------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                                Warm Up
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |   std |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |     1 |   412.012 |  412.012 | 412.012 | 412.012 | 412.012 | 0.000 |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                          Run without statistics
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    83 |   134.143 |  121.375 | 120.978 | 134.143 | 121.844 | 1396.135 |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] -----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                           Run with statistics
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |     std |
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    82 |   122.892 |  122.783 | 121.927 | 124.547 | 122.641 | 368.202 |
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                                       Sort by Run Order
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type |   Start |  First | Avg(ms) |     % |    cdf% | GMACPS | Stride |   Pad |   Filter Shape |   Output Shape | Dilation |                name |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |   0.000 |  1.158 |   1.127 | 0.926 |   0.926 | 19.232 |  [1,1] | [2,2] |     [64,3,3,3] | [1,64,112,112] |          |               relu0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |   1.149 | 11.004 |  10.822 | 8.890 |   9.816 | 10.682 |  [2,2] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  12.001 |  3.026 |   2.936 | 2.412 |  12.228 | 39.380 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  14.964 |  9.724 |   9.784 | 8.037 |  20.265 |  1.313 |  [2,2] | [0,0] |    [64,64,1,1] |   [1,64,56,56] |          | stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  24.773 |  0.069 |   0.070 | 0.058 |  20.322 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  24.846 |  3.005 |   3.071 | 2.523 |  22.845 | 37.642 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  27.943 |  2.982 |   3.038 | 2.496 |  25.341 | 38.049 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  31.005 |  0.070 |   0.071 | 0.059 |  25.400 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  31.079 |  3.112 |   3.037 | 2.495 |  27.895 | 38.066 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  34.154 |  2.988 |   3.015 | 2.477 |  30.372 | 38.340 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  37.194 |  0.070 |   0.072 | 0.059 |  30.431 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  37.268 |  4.447 |   4.481 | 3.681 |  34.112 | 12.901 |  [2,2] | [2,2] |   [128,64,3,3] |  [1,128,28,28] |          |  stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  41.767 |  2.309 |   2.254 | 1.852 |  35.963 | 51.291 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  44.043 |  4.901 |   4.939 | 4.057 |  40.021 |  1.300 |  [2,2] | [0,0] |   [128,64,1,1] |  [1,128,28,28] |          | stage2_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  48.997 |  0.022 |   0.023 | 0.019 |  40.040 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus3 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  49.022 |  2.241 |   2.257 | 1.854 |  41.893 | 51.229 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  51.302 |  2.319 |   2.222 | 1.825 |  43.718 | 52.036 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  53.547 |  0.038 |   0.038 | 0.031 |  43.750 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus4 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  53.587 |  2.214 |   2.223 | 1.826 |  45.576 | 52.015 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  55.833 |  2.208 |   2.209 | 1.815 |  47.390 | 52.334 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  58.064 |  0.037 |   0.038 | 0.032 |  47.422 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus5 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  58.104 |  2.223 |   2.215 | 1.820 |  49.241 | 52.193 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  60.341 |  2.296 |   2.205 | 1.811 |  51.053 | 52.432 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  62.568 |  0.040 |   0.037 | 0.031 |  51.083 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus6 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  62.607 |  5.095 |   5.112 | 4.200 |  55.283 | 11.306 |  [2,2] | [2,2] |  [256,128,3,3] |  [1,256,14,14] |          |  stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  67.737 |  1.953 |   1.968 | 1.617 |  56.900 | 58.734 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  69.726 |  5.903 |   5.872 | 4.824 |  61.724 |  1.094 |  [2,2] | [0,0] |  [256,128,1,1] |  [1,256,14,14] |          | stage3_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  75.615 |  0.016 |   0.017 | 0.014 |  61.738 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus7 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  75.633 |  1.949 |   1.958 | 1.608 |  63.346 | 59.045 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  77.612 |  1.934 |   1.946 | 1.599 |  64.945 | 59.396 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  79.580 |  0.020 |   0.022 | 0.018 |  64.963 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus8 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  79.603 |  1.936 |   1.951 | 1.603 |  66.566 | 59.244 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  81.575 |  1.982 |   1.942 | 1.595 |  68.161 | 59.536 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  83.541 |  0.024 |   0.022 | 0.018 |  68.179 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus9 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  83.565 |  1.946 |   1.956 | 1.607 |  69.786 | 59.100 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  85.542 |  1.938 |   1.947 | 1.600 |  71.385 | 59.364 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  87.510 |  0.020 |   0.021 | 0.018 |  71.403 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus10 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  87.534 |  1.930 |   1.949 | 1.601 |  73.004 | 59.323 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  89.504 |  1.919 |   1.945 | 1.598 |  74.602 | 59.427 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  91.469 |  0.022 |   0.021 | 0.018 |  74.619 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus11 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  91.492 |  2.017 |   1.960 | 1.610 |  76.230 | 58.979 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  93.472 |  1.938 |   1.954 | 1.605 |  77.835 | 59.161 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  95.448 |  0.022 |   0.021 | 0.018 |  77.852 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus12 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  95.471 |  5.204 |   5.259 | 4.320 |  82.172 | 10.992 |  [2,2] | [2,2] |  [512,256,3,3] |    [1,512,7,7] |          |  stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 100.750 |  3.136 |   3.120 | 2.563 |  84.736 | 37.049 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 103.893 |  5.230 |   5.260 | 4.321 |  89.057 |  1.221 |  [2,2] | [0,0] |  [512,256,1,1] |    [1,512,7,7] |          | stage4_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 109.169 |  0.010 |   0.011 | 0.009 |  89.066 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus13 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 109.182 |  3.136 |   3.107 | 2.553 |  91.618 | 37.203 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 112.312 |  3.068 |   3.075 | 2.526 |  94.144 | 37.597 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 115.410 |  0.015 |   0.015 | 0.012 |  94.156 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus14 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 115.426 |  3.065 |   3.090 | 2.538 |  96.695 | 37.414 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 118.539 |  3.041 |   3.066 | 2.519 |  99.213 | 37.702 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 121.627 |  0.013 |   0.014 | 0.012 |  99.225 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus15 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 121.643 |  0.775 |   0.752 | 0.618 |  99.843 | 34.141 |  [1,1] | [0,0] | [1024,512,1,1] |   [1,1024,7,7] |          |              convx1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d | 122.410 |  0.084 |   0.084 | 0.069 |  99.912 |  0.600 |  [1,1] | [0,0] |   [1,1024,7,7] |   [1,1024,1,1] |          |  conv_6dw7_7_conv2d |
I mace/benchmark/statistics.cc:347] |  FullyConnected | 122.503 |  0.107 |   0.107 | 0.088 | 100.000 |  4.895 |        |       | [512,1024,1,1] |    [1,512,1,1] |          |             pre_fc1 |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                               Sort by Computation Time
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | Op Type |   Start |  First | Avg(ms) |     % |   cdf% | GMACPS | Stride |   Pad |  Filter Shape |  Output Shape | Dilation |                name |
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |  Conv2D |   1.149 | 11.004 |  10.822 | 8.890 |  8.890 | 10.682 |  [2,2] | [2,2] |   [64,64,3,3] |  [1,64,56,56] |          |  stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D |  14.964 |  9.724 |   9.784 | 8.037 | 16.927 |  1.313 |  [2,2] | [0,0] |   [64,64,1,1] |  [1,64,56,56] |          | stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |  Conv2D |  69.726 |  5.903 |   5.872 | 4.824 | 21.751 |  1.094 |  [2,2] | [0,0] | [256,128,1,1] | [1,256,14,14] |          | stage3_unit1_screlu |
I mace/benchmark/statistics.cc:347] |  Conv2D | 103.893 |  5.230 |   5.260 | 4.321 | 26.073 |  1.221 |  [2,2] | [0,0] | [512,256,1,1] |   [1,512,7,7] |          | stage4_unit1_screlu |
I mace/benchmark/statistics.cc:347] |  Conv2D |  95.471 |  5.204 |   5.259 | 4.320 | 30.393 | 10.992 |  [2,2] | [2,2] | [512,256,3,3] |   [1,512,7,7] |          |  stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D |  62.607 |  5.095 |   5.112 | 4.200 | 34.592 | 11.306 |  [2,2] | [2,2] | [256,128,3,3] | [1,256,14,14] |          |  stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D |  44.043 |  4.901 |   4.939 | 4.057 | 38.649 |  1.300 |  [2,2] | [0,0] |  [128,64,1,1] | [1,128,28,28] |          | stage2_unit1_screlu |
I mace/benchmark/statistics.cc:347] |  Conv2D |  37.268 |  4.447 |   4.481 | 3.681 | 42.330 | 12.901 |  [2,2] | [2,2] |  [128,64,3,3] | [1,128,28,28] |          |  stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 100.750 |  3.136 |   3.120 | 2.563 | 44.893 | 37.049 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 109.182 |  3.136 |   3.107 | 2.553 | 47.446 | 37.203 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                         Stat by Op Type
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type | Count | Avg(ms) |      % |    cdf% |          MACs | GMACPS | Called times |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |    38 | 121.007 | 99.426 |  99.426 | 3,605,446,656 | 29.795 |           38 |
I mace/benchmark/statistics.cc:347] |         Eltwise |    16 |   0.508 |  0.417 |  99.844 |             0 |  0.000 |           16 |
I mace/benchmark/statistics.cc:347] |  FullyConnected |     1 |   0.107 |  0.088 |  99.932 |       524,288 |  4.900 |            1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d |     1 |   0.083 |  0.068 | 100.000 |        50,176 |  0.605 |            1 |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] -----------------------------------------------------------
I mace/benchmark/statistics.cc:347]            Stat by MACs(Multiply-Accumulation)
I mace/benchmark/statistics.cc:347] -----------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         total | round | first(G/s) | avg(G/s) |     std |
I mace/benchmark/statistics.cc:347] -----------------------------------------------------------
I mace/benchmark/statistics.cc:347] | 3,606,021,120 |    82 |     29.569 |   29.623 | 365.000 |
I mace/benchmark/statistics.cc:347] -----------------------------------------------------------
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                           Summary of Ops' Stat
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |     std |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |    82 |   121.951 |  121.880 | 121.017 | 123.636 | 121.731 | 365.000 |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] 56 ops total.
I mace/libmace/mace.cc:603] Destroying MaceEngine
*************************************************************
          Benchmark model sp on msmnile         
*************************************************************
I mace/benchmark/benchmark_model.cc:202] Model name: [sp]
I mace/benchmark/benchmark_model.cc:203] Model_file: 
I mace/benchmark/benchmark_model.cc:204] Device: [CPU]
I mace/benchmark/benchmark_model.cc:205] gpu_perf_hint: [3]
I mace/benchmark/benchmark_model.cc:206] gpu_priority_hint: [3]
I mace/benchmark/benchmark_model.cc:207] omp_num_threads: [-1]
I mace/benchmark/benchmark_model.cc:208] cpu_affinity_policy: [1]
I mace/benchmark/benchmark_model.cc:209] Input node: [data]
I mace/benchmark/benchmark_model.cc:210] Input shapes: [1,3,112,112]
I mace/benchmark/benchmark_model.cc:211] Output node: [fc1bn]
I mace/benchmark/benchmark_model.cc:212] output shapes: [1,1,1,512]
I mace/benchmark/benchmark_model.cc:213] Warmup runs: [1]
I mace/benchmark/benchmark_model.cc:214] Num runs: [100]
I mace/benchmark/benchmark_model.cc:215] Max run seconds: [10]
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ---------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                                Warm Up
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |   std |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |     1 |   278.372 |  278.372 | 278.372 | 278.372 | 278.372 | 0.000 |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                          Run without statistics
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    44 |   226.746 |  226.512 | 225.684 | 258.229 | 229.562 | 6732.804 |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                           Run with statistics
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    44 |   229.383 |  227.496 | 226.548 | 259.413 | 230.770 | 6311.498 |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                                          Sort by Run Order
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type |   Start | First | Avg(ms) |     % |    cdf% |  GMACPS | Stride |   Pad |   Filter Shape |   Output Shape | Dilation |                     name |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |        Quantize |   0.000 | 0.076 |   0.076 | 0.033 |   0.033 |   0.000 |        |       |                |  [1,112,112,3] |          |     mace_input_node_data |
I mace/benchmark/statistics.cc:347] |          Conv2D |   0.079 | 3.898 |   3.954 | 1.725 |   1.758 |   5.482 |  [1,1] | [2,2] |     [64,3,3,3] | [1,112,112,64] |          |                    relu0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |   4.062 | 6.055 |   6.192 | 2.701 |   4.459 |  18.671 |  [2,2] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  10.290 | 6.188 |   6.035 | 2.632 |   7.091 |  19.155 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  16.358 | 1.430 |   1.478 | 0.645 |   7.736 |   8.689 |  [2,2] | [0,0] |    [64,1,1,64] |   [1,56,56,64] |          |      stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  17.858 | 0.219 |   0.222 | 0.097 |   7.833 |   0.000 |        |       |                |   [1,56,56,64] |          |                   _plus0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  18.081 | 5.938 |   6.044 | 2.636 |  10.469 |  19.128 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  24.156 | 5.896 |   6.017 | 2.624 |  13.093 |  19.214 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  30.202 | 0.296 |   0.296 | 0.129 |  13.222 |   0.000 |        |       |                |   [1,56,56,64] |          |                   _plus1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  30.504 | 5.596 |   6.054 | 2.640 |  15.863 |  19.097 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  36.587 | 5.934 |   6.018 | 2.625 |  18.488 |  19.209 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  42.634 | 0.294 |   0.342 | 0.149 |  18.637 |   0.000 |        |       |                |   [1,56,56,64] |          |                   _plus2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  42.979 | 3.519 |   3.937 | 1.717 |  20.354 |  14.683 |  [2,2] | [2,2] |   [128,3,3,64] |  [1,28,28,128] |          |       stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  46.943 | 6.408 |   5.864 | 2.558 |  22.912 |  19.713 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  52.837 | 0.729 |   0.881 | 0.384 |  23.296 |   7.290 |  [2,2] | [0,0] |   [128,1,1,64] |  [1,28,28,128] |          |      stage2_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  53.736 | 0.111 |   0.159 | 0.069 |  23.365 |   0.000 |        |       |                |  [1,28,28,128] |          |                   _plus3 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  53.897 | 5.720 |   5.819 | 2.538 |  25.903 |  19.868 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  59.749 | 5.912 |   5.955 | 2.597 |  28.501 |  19.413 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  65.735 | 0.192 |   0.291 | 0.127 |  28.628 |   0.000 |        |       |                |  [1,28,28,128] |          |                   _plus4 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  66.030 | 5.384 |   5.834 | 2.545 |  31.172 |  19.816 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  71.892 | 5.693 |   5.748 | 2.507 |  33.679 |  20.113 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  77.669 | 0.188 |   0.317 | 0.138 |  33.817 |   0.000 |        |       |                |  [1,28,28,128] |          |                   _plus5 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  77.989 | 5.379 |   5.921 | 2.583 |  36.400 |  19.524 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  83.940 | 6.380 |   5.782 | 2.522 |  38.922 |  19.993 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  89.752 | 0.189 |   0.235 | 0.103 |  39.025 |   0.000 |        |       |                |  [1,28,28,128] |          |                   _plus6 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  89.991 | 4.016 |   4.114 | 1.794 |  40.819 |  14.050 |  [2,2] | [2,2] |  [256,3,3,128] |  [1,14,14,256] |          |       stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  94.133 | 8.242 |   8.423 | 3.674 |  44.493 |  13.725 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 102.587 | 0.609 |   0.640 | 0.279 |  44.772 |  10.033 |  [2,2] | [0,0] |  [256,1,1,128] |  [1,14,14,256] |          |      stage3_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 103.246 | 0.059 |   0.059 | 0.026 |  44.798 |   0.000 |        |       |                |  [1,14,14,256] |          |                   _plus7 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 103.308 | 7.983 |   8.309 | 3.624 |  48.423 |  13.913 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 111.652 | 8.990 |   8.313 | 3.626 |  52.049 |  13.907 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 119.997 | 0.132 |   0.140 | 0.061 |  52.110 |   0.000 |        |       |                |  [1,14,14,256] |          |                   _plus8 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 120.139 | 8.252 |   8.339 | 3.637 |  55.747 |  13.863 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 128.517 | 8.015 |   8.225 | 3.588 |  59.335 |  14.055 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 136.774 | 0.129 |   0.130 | 0.057 |  59.392 |   0.000 |        |       |                |  [1,14,14,256] |          |                   _plus9 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 136.907 | 8.524 |   8.291 | 3.616 |  63.008 |  13.943 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 145.232 | 8.274 |   8.322 | 3.630 |  66.638 |  13.892 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 153.585 | 0.130 |   0.132 | 0.058 |  66.695 |   0.000 |        |       |                |  [1,14,14,256] |          |                  _plus10 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 153.719 | 7.997 |   8.253 | 3.600 |  70.295 |  14.007 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 162.005 | 9.002 |   8.302 | 3.621 |  73.916 |  13.925 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 170.340 | 0.133 |   0.138 | 0.060 |  73.976 |   0.000 |        |       |                |  [1,14,14,256] |          |                  _plus11 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 170.481 | 8.729 |   8.363 | 3.648 |  77.624 |  13.823 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 178.878 | 8.047 |   8.326 | 3.632 |  81.256 |  13.885 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 187.236 | 0.127 |   0.129 | 0.056 |  81.312 |   0.000 |        |       |                |  [1,14,14,256] |          |                  _plus12 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 187.369 | 4.416 |   4.222 | 1.841 |  83.154 |  13.692 |  [2,2] | [2,2] |  [512,3,3,256] |    [1,7,7,512] |          |       stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 191.623 | 6.957 |   7.064 | 3.081 |  86.235 |  16.365 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 198.725 | 0.609 |   1.006 | 0.439 |  86.674 |   6.382 |  [2,2] | [0,0] |  [512,1,1,256] |    [1,7,7,512] |          |      stage4_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 199.750 | 0.034 |   0.035 | 0.015 |  86.689 |   0.000 |        |       |                |    [1,7,7,512] |          |                  _plus13 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 199.788 | 7.607 |   6.986 | 3.047 |  89.736 |  16.549 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 206.806 | 6.738 |   7.049 | 3.075 |  92.811 |  16.400 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 213.890 | 0.061 |   0.091 | 0.040 |  92.850 |   0.000 |        |       |                |    [1,7,7,512] |          |                  _plus14 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 213.984 | 6.928 |   6.941 | 3.028 |  95.878 |  16.655 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 220.962 | 6.627 |   6.975 | 3.042 |  98.920 |  16.575 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 227.975 | 0.287 |   0.068 | 0.030 |  98.950 |   0.000 |        |       |                |    [1,7,7,512] |          |                  _plus15 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 228.047 | 2.047 |   2.022 | 0.882 |  99.832 |  12.704 |  [1,1] | [0,0] | [1024,1,1,512] |   [1,7,7,1024] |          |                   convx1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d | 230.086 | 0.099 |   0.038 | 0.017 |  99.849 | 192.125 |  [1,1] | [0,0] |   [7,7,1024,1] |   [1,1,1,1024] |          |       conv_6dw7_7_conv2d |
I mace/benchmark/statistics.cc:347] |          Conv2D | 230.137 | 0.348 |   0.342 | 0.149 |  99.998 |   1.535 |  [1,1] | [0,0] | [512,1,1,1024] |    [1,1,1,512] |          | mace_output_node_pre_fc1 |
I mace/benchmark/statistics.cc:347] |      Dequantize | 230.492 | 0.006 |   0.005 | 0.002 | 100.000 |   0.000 |        |       |                |    [1,1,1,512] |          |                    fc1bn |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                              Sort by Computation Time
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | Op Type |   Start | First | Avg(ms) |     % |   cdf% | GMACPS | Stride |   Pad |  Filter Shape |  Output Shape | Dilation |               name |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |  Conv2D |  94.133 | 8.242 |   8.423 | 3.674 |  3.674 | 13.725 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 170.481 | 8.729 |   8.363 | 3.648 |  7.322 | 13.823 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 120.139 | 8.252 |   8.339 | 3.637 | 10.959 | 13.863 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 178.878 | 8.047 |   8.326 | 3.632 | 14.591 | 13.885 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 145.232 | 8.274 |   8.322 | 3.630 | 18.220 | 13.892 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 111.652 | 8.990 |   8.313 | 3.626 | 21.846 | 13.907 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 103.308 | 7.983 |   8.309 | 3.624 | 25.471 | 13.913 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 162.005 | 9.002 |   8.302 | 3.621 | 29.092 | 13.925 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 136.907 | 8.524 |   8.291 | 3.616 | 32.708 | 13.943 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 153.719 | 7.997 |   8.253 | 3.600 | 36.308 | 14.007 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                         Stat by Op Type
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type | Count | Avg(ms) |      % |    cdf% |          MACs |  GMACPS | Called times |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |    39 | 226.340 | 98.737 |  98.737 | 3,605,970,944 |  15.932 |           39 |
I mace/benchmark/statistics.cc:347] |         Eltwise |    16 |   2.776 |  1.211 |  99.948 |             0 |   0.000 |           16 |
I mace/benchmark/statistics.cc:347] |        Quantize |     1 |   0.076 |  0.033 |  99.981 |             0 |   0.000 |            1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d |     1 |   0.038 |  0.017 |  99.998 |     7,340,032 | 193.159 |            1 |
I mace/benchmark/statistics.cc:347] |      Dequantize |     1 |   0.005 |  0.002 | 100.000 |             0 |   0.000 |            1 |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------
I mace/benchmark/statistics.cc:347]             Stat by MACs(Multiply-Accumulation)
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         total | round | first(G/s) | avg(G/s) |      std |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | 3,613,310,976 |    44 |     15.863 |   15.760 | 6269.283 |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                           Summary of Ops' Stat
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |    44 |   227.778 |  226.001 | 225.092 | 257.680 | 229.264 | 6269.283 |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] 58 ops total.
I mace/libmace/mace.cc:603] Destroying MaceEngine
*************************************************************
          Benchmark model sp on qcs605          
*************************************************************
I mace/benchmark/benchmark_model.cc:202] Model name: [sp]
I mace/benchmark/benchmark_model.cc:203] Model_file: 
I mace/benchmark/benchmark_model.cc:204] Device: [CPU]
I mace/benchmark/benchmark_model.cc:205] gpu_perf_hint: [3]
I mace/benchmark/benchmark_model.cc:206] gpu_priority_hint: [3]
I mace/benchmark/benchmark_model.cc:207] omp_num_threads: [-1]
I mace/benchmark/benchmark_model.cc:208] cpu_affinity_policy: [1]
I mace/benchmark/benchmark_model.cc:209] Input node: [data]
I mace/benchmark/benchmark_model.cc:210] Input shapes: [1,3,112,112]
I mace/benchmark/benchmark_model.cc:211] Output node: [fc1bn]
I mace/benchmark/benchmark_model.cc:212] output shapes: [1,1,1,512]
I mace/benchmark/benchmark_model.cc:213] Warmup runs: [1]
I mace/benchmark/benchmark_model.cc:214] Num runs: [100]
I mace/benchmark/benchmark_model.cc:215] Max run seconds: [10]
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                                 Warm Up
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) |  min(ms) |  max(ms) |  avg(ms) |   std |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |     1 |  1703.949 | 1703.949 | 1703.949 | 1703.949 | 1703.949 | 0.000 |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                          Run without statistics
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    30 |   347.233 |  355.134 | 324.497 | 357.980 | 343.235 | 8428.071 |
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                            Run with statistics
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    30 |   339.917 |  341.660 | 317.651 | 367.916 | 343.331 | 11751.904 |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                                       Sort by Run Order
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type |   Start |  First | Avg(ms) |     % |    cdf% | GMACPS | Stride |   Pad |   Filter Shape |   Output Shape | Dilation |                name |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |   0.000 |  8.727 |   4.612 | 1.352 |   1.352 |  4.699 |  [1,1] | [2,2] |     [64,3,3,3] | [1,64,112,112] |          |               relu0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |   4.668 | 28.125 |  28.545 | 8.370 |   9.722 |  4.050 |  [2,2] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  33.288 |  6.931 |   7.398 | 2.169 |  11.891 | 15.628 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  40.740 | 23.381 |  24.941 | 7.313 |  19.205 |  0.515 |  [2,2] | [0,0] |    [64,64,1,1] |   [1,64,56,56] |          | stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  65.743 |  0.424 |   0.318 | 0.093 |  19.298 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  66.075 |  7.793 |   7.790 | 2.284 |  21.582 | 14.840 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  73.916 |  7.172 |   6.874 | 2.016 |  23.598 | 16.818 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  80.840 |  0.224 |   0.183 | 0.054 |  23.651 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  81.033 |  7.364 |   7.018 | 2.058 |  25.709 | 16.473 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  88.148 |  6.447 |   6.703 | 1.966 |  27.675 | 17.246 |  [1,1] | [2,2] |    [64,64,3,3] |   [1,64,56,56] |          |  stage1_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  94.901 |  0.167 |   0.180 | 0.053 |  27.728 |  0.000 |        |       |                |   [1,64,56,56] |          |              _plus2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  95.090 | 10.986 |  13.771 | 4.038 |  31.765 |  4.198 |  [2,2] | [2,2] |   [128,64,3,3] |  [1,128,28,28] |          |  stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 108.919 |  7.270 |   6.409 | 1.879 |  33.644 | 18.039 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 115.380 |  8.906 |   8.910 | 2.612 |  36.257 |  0.721 |  [2,2] | [0,0] |   [128,64,1,1] |  [1,128,28,28] |          | stage2_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 124.341 |  0.064 |   0.079 | 0.023 |  36.280 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus3 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 124.427 |  6.015 |   6.337 | 1.858 |  38.138 | 18.244 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 130.814 |  5.542 |   6.047 | 1.773 |  39.911 | 19.117 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 136.921 |  0.102 |   0.211 | 0.062 |  39.973 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus4 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 137.140 |  7.359 |   6.218 | 1.823 |  41.797 | 18.592 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 143.413 |  5.962 |   5.726 | 1.679 |  43.475 | 20.190 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 149.207 |  0.104 |   0.081 | 0.024 |  43.499 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus5 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 149.297 |  5.545 |   5.756 | 1.688 |  45.187 | 20.084 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 155.106 |  5.666 |   5.727 | 1.679 |  46.866 | 20.185 |  [1,1] | [2,2] |  [128,128,3,3] |  [1,128,28,28] |          |  stage2_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 160.885 |  0.075 |   0.075 | 0.022 |  46.888 |  0.000 |        |       |                |  [1,128,28,28] |          |              _plus6 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 160.966 | 12.732 |  14.461 | 4.240 |  51.129 |  3.997 |  [2,2] | [2,2] |  [256,128,3,3] |  [1,256,14,14] |          |  stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 175.483 |  6.434 |   6.069 | 1.780 |  52.908 | 19.049 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 181.599 | 10.298 |   9.710 | 2.847 |  55.755 |  0.661 |  [2,2] | [0,0] |  [256,128,1,1] |  [1,256,14,14] |          | stage3_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 191.361 |  0.032 |   0.044 | 0.013 |  55.768 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus7 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 191.420 |  5.259 |   6.238 | 1.829 |  57.597 | 18.533 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 197.705 |  6.891 |   6.286 | 1.843 |  59.440 | 18.392 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 204.041 |  0.036 |   0.040 | 0.012 |  59.452 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus8 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 204.086 |  5.310 |   5.837 | 1.712 |  61.163 | 19.805 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 209.976 |  5.416 |   5.466 | 1.603 |  62.766 | 21.148 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 215.502 |  0.033 |   0.041 | 0.012 |  62.778 |  0.000 |        |       |                |  [1,256,14,14] |          |              _plus9 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 215.546 |  5.582 |   5.689 | 1.668 |  64.446 | 20.322 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 221.286 |  5.303 |   5.489 | 1.609 |  66.056 | 21.063 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 226.825 |  0.034 |   0.039 | 0.012 |  66.067 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus10 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 226.868 |  5.742 |   5.648 | 1.656 |  67.723 | 20.467 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 232.565 |  5.271 |   5.529 | 1.621 |  69.345 | 20.909 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 238.148 |  0.078 |   0.041 | 0.012 |  69.357 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus11 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 238.193 |  5.908 |   5.651 | 1.657 |  71.014 | 20.457 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 243.893 |  5.393 |   5.758 | 1.689 |  72.702 | 20.076 |  [1,1] | [2,2] |  [256,256,3,3] |  [1,256,14,14] |          |  stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 249.705 |  0.035 |   0.039 | 0.011 |  72.714 |  0.000 |        |       |                |  [1,256,14,14] |          |             _plus12 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 249.749 | 16.792 |  15.521 | 4.551 |  77.265 |  3.724 |  [2,2] | [2,2] |  [512,256,3,3] |    [1,512,7,7] |          |  stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 265.341 | 12.329 |  13.139 | 3.853 |  81.117 |  8.798 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 278.545 | 13.807 |  11.460 | 3.360 |  84.478 |  0.560 |  [2,2] | [0,0] |  [512,256,1,1] |    [1,512,7,7] |          | stage4_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 290.051 |  0.037 |   0.095 | 0.028 |  84.505 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus13 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 290.149 | 12.908 |  14.068 | 4.125 |  88.630 |  8.218 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 304.266 | 10.369 |  11.983 | 3.514 |  92.144 |  9.648 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 316.298 |  0.027 |   0.025 | 0.007 |  92.151 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus14 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 316.326 | 10.983 |  11.967 | 3.509 |  95.660 |  9.660 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 328.342 | 10.469 |  11.826 | 3.468 |  99.128 |  9.776 |  [1,1] | [2,2] |  [512,512,3,3] |    [1,512,7,7] |          |  stage4_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 340.217 |  0.024 |   0.030 | 0.009 |  99.137 |  0.000 |        |       |                |    [1,512,7,7] |          |             _plus15 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 340.250 |  2.579 |   2.572 | 0.754 |  99.891 |  9.987 |  [1,1] | [0,0] | [1024,512,1,1] |   [1,1024,7,7] |          |              convx1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d | 342.867 |  0.133 |   0.139 | 0.041 |  99.932 |  0.361 |  [1,1] | [0,0] |   [1,1024,7,7] |   [1,1024,1,1] |          |  conv_6dw7_7_conv2d |
I mace/benchmark/statistics.cc:347] |  FullyConnected | 343.038 |  0.340 |   0.233 | 0.068 | 100.000 |  2.255 |        |       | [512,1024,1,1] |    [1,512,1,1] |          |             pre_fc1 |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                               Sort by Computation Time
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | Op Type |   Start |  First | Avg(ms) |     % |   cdf% | GMACPS | Stride |   Pad |  Filter Shape |  Output Shape | Dilation |                name |
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |  Conv2D |   4.668 | 28.125 |  28.545 | 8.370 |  8.370 |  4.050 |  [2,2] | [2,2] |   [64,64,3,3] |  [1,64,56,56] |          |  stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D |  40.740 | 23.381 |  24.941 | 7.313 | 15.683 |  0.515 |  [2,2] | [0,0] |   [64,64,1,1] |  [1,64,56,56] |          | stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |  Conv2D | 249.749 | 16.792 |  15.521 | 4.551 | 20.234 |  3.724 |  [2,2] | [2,2] | [512,256,3,3] |   [1,512,7,7] |          |  stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 160.966 | 12.732 |  14.461 | 4.240 | 24.474 |  3.997 |  [2,2] | [2,2] | [256,128,3,3] | [1,256,14,14] |          |  stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 290.149 | 12.908 |  14.068 | 4.125 | 28.599 |  8.218 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D |  95.090 | 10.986 |  13.771 | 4.038 | 32.637 |  4.198 |  [2,2] | [2,2] |  [128,64,3,3] | [1,128,28,28] |          |  stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 265.341 | 12.329 |  13.139 | 3.853 | 36.490 |  8.798 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 304.266 | 10.369 |  11.983 | 3.514 | 40.003 |  9.648 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 316.326 | 10.983 |  11.967 | 3.509 | 43.512 |  9.660 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 328.342 | 10.469 |  11.826 | 3.468 | 46.980 |  9.776 |  [1,1] | [2,2] | [512,512,3,3] |   [1,512,7,7] |          |  stage4_unit3_relu2 |
I mace/benchmark/statistics.cc:347] ----------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                         Stat by Op Type
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type | Count | Avg(ms) |      % |    cdf% |          MACs | GMACPS | Called times |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |    38 | 339.130 | 99.448 |  99.448 | 3,605,446,656 | 10.631 |           38 |
I mace/benchmark/statistics.cc:347] |         Eltwise |    16 |   1.514 |  0.444 |  99.892 |             0 |  0.000 |           16 |
I mace/benchmark/statistics.cc:347] |  FullyConnected |     1 |   0.232 |  0.068 |  99.960 |       524,288 |  2.260 |            1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d |     1 |   0.138 |  0.040 | 100.000 |        50,176 |  0.364 |            1 |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347]             Stat by MACs(Multiply-Accumulation)
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         total | round | first(G/s) | avg(G/s) |       std |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | 3,606,021,120 |    30 |     10.702 |   10.574 | 11713.787 |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                            Summary of Ops' Stat
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |    30 |   336.935 |  339.407 | 315.588 | 365.599 | 341.042 | 11713.787 |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] 56 ops total.
I mace/libmace/mace.cc:603] Destroying MaceEngine
*************************************************************
          Benchmark model sp on qcs605          
*************************************************************
I mace/benchmark/benchmark_model.cc:202] Model name: [sp]
I mace/benchmark/benchmark_model.cc:203] Model_file: 
I mace/benchmark/benchmark_model.cc:204] Device: [CPU]
I mace/benchmark/benchmark_model.cc:205] gpu_perf_hint: [3]
I mace/benchmark/benchmark_model.cc:206] gpu_priority_hint: [3]
I mace/benchmark/benchmark_model.cc:207] omp_num_threads: [-1]
I mace/benchmark/benchmark_model.cc:208] cpu_affinity_policy: [1]
I mace/benchmark/benchmark_model.cc:209] Input node: [data]
I mace/benchmark/benchmark_model.cc:210] Input shapes: [1,3,112,112]
I mace/benchmark/benchmark_model.cc:211] Output node: [fc1bn]
I mace/benchmark/benchmark_model.cc:212] output shapes: [1,1,1,512]
I mace/benchmark/benchmark_model.cc:213] Warmup runs: [1]
I mace/benchmark/benchmark_model.cc:214] Num runs: [100]
I mace/benchmark/benchmark_model.cc:215] Max run seconds: [10]
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] ---------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                                Warm Up
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |   std |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |     1 |   353.460 |  353.460 | 353.460 | 353.460 | 353.460 | 0.000 |
I mace/benchmark/benchmark_model.cc:155] ----------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                           Run without statistics
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    30 |   360.735 |  339.396 | 294.324 | 362.079 | 336.535 | 15204.868 |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] 
I mace/benchmark/benchmark_model.cc:155] -------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155]                            Run with statistics
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/benchmark_model.cc:155] |    30 |   364.769 |  330.680 | 309.523 | 364.769 | 338.014 | 13015.378 |
I mace/benchmark/benchmark_model.cc:155] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                                          Sort by Run Order
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type |   Start |  First | Avg(ms) |     % |    cdf% | GMACPS | Stride |   Pad |   Filter Shape |   Output Shape | Dilation |                     name |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |        Quantize |   0.000 |  0.121 |   0.165 | 0.049 |   0.049 |  0.000 |        |       |                |  [1,112,112,3] |          |     mace_input_node_data |
I mace/benchmark/statistics.cc:347] |          Conv2D |   0.173 |  9.412 |   5.442 | 1.623 |   1.672 |  3.983 |  [1,1] | [2,2] |     [64,3,3,3] | [1,112,112,64] |          |                    relu0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |   5.669 | 10.729 |   9.392 | 2.801 |   4.473 | 12.309 |  [2,2] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  15.115 | 11.657 |   9.466 | 2.823 |   7.296 | 12.212 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  24.634 |  2.273 |   3.647 | 1.088 |   8.384 |  3.522 |  [2,2] | [0,0] |    [64,1,1,64] |   [1,56,56,64] |          |      stage1_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  28.331 |  1.556 |   0.608 | 0.181 |   8.565 |  0.000 |        |       |                |   [1,56,56,64] |          |                   _plus0 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  28.945 | 12.403 |   8.392 | 2.503 |  11.068 | 13.776 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  37.398 | 11.053 |   9.435 | 2.814 |  13.881 | 12.253 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  46.883 |  2.763 |   1.561 | 0.465 |  14.347 |  0.000 |        |       |                |   [1,56,56,64] |          |                   _plus1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  48.461 | 10.481 |   8.409 | 2.508 |  16.855 | 13.747 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  56.923 | 10.095 |   9.541 | 2.845 |  19.700 | 12.117 |  [1,1] | [2,2] |    [64,3,3,64] |   [1,56,56,64] |          |       stage1_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise |  66.512 |  1.198 |   1.655 | 0.493 |  20.193 |  0.000 |        |       |                |   [1,56,56,64] |          |                   _plus2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  68.175 |  4.302 |   5.464 | 1.630 |  21.823 | 10.578 |  [2,2] | [2,2] |   [128,3,3,64] |  [1,28,28,128] |          |       stage2_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  73.691 |  7.155 |   8.648 | 2.579 |  24.402 | 13.367 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  82.392 |  3.739 |   2.381 | 0.710 |  25.112 |  2.697 |  [2,2] | [0,0] |   [128,1,1,64] |  [1,28,28,128] |          |      stage2_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise |  84.818 |  0.178 |   0.268 | 0.080 |  25.192 |  0.000 |        |       |                |  [1,28,28,128] |          |                   _plus3 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  85.091 |  7.197 |   7.737 | 2.307 |  27.500 | 14.943 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D |  92.883 |  7.436 |   9.006 | 2.686 |  30.185 | 12.837 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 101.948 |  0.890 |   1.194 | 0.356 |  30.541 |  0.000 |        |       |                |  [1,28,28,128] |          |                   _plus4 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 103.148 | 11.591 |   8.025 | 2.393 |  32.935 | 14.406 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 111.222 | 10.474 |   9.004 | 2.685 |  35.620 | 12.839 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 120.279 |  1.343 |   1.552 | 0.463 |  36.083 |  0.000 |        |       |                |  [1,28,28,128] |          |                   _plus5 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 121.838 |  7.450 |   8.137 | 2.427 |  38.509 | 14.207 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 130.027 | 10.451 |   9.228 | 2.752 |  41.261 | 12.527 |  [1,1] | [2,2] |  [128,3,3,128] |  [1,28,28,128] |          |       stage2_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 139.312 |  1.416 |   1.099 | 0.328 |  41.589 |  0.000 |        |       |                |  [1,28,28,128] |          |                   _plus6 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 140.416 |  5.840 |   5.695 | 1.698 |  43.288 | 10.149 |  [2,2] | [2,2] |  [256,3,3,128] |  [1,14,14,256] |          |       stage3_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 146.166 | 10.927 |  12.394 | 3.696 |  46.984 |  9.327 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 158.619 |  3.450 |   2.077 | 0.620 |  47.603 |  3.092 |  [2,2] | [0,0] |  [256,1,1,128] |  [1,14,14,256] |          |      stage3_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 160.734 |  0.092 |   0.166 | 0.050 |  47.653 |  0.000 |        |       |                |  [1,14,14,256] |          |                   _plus7 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 160.904 | 10.274 |  10.785 | 3.216 |  50.869 | 10.719 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 171.745 | 11.628 |  11.592 | 3.457 |  54.326 |  9.973 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 183.395 |  0.567 |   0.610 | 0.182 |  54.508 |  0.000 |        |       |                |  [1,14,14,256] |          |                   _plus8 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 184.009 | 11.539 |  10.990 | 3.278 |  57.785 | 10.519 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 195.071 | 13.797 |  11.694 | 3.487 |  61.273 |  9.886 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 206.897 |  0.141 |   0.790 | 0.235 |  61.508 |  0.000 |        |       |                |  [1,14,14,256] |          |                   _plus9 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 207.691 | 12.616 |  11.099 | 3.310 |  64.818 | 10.416 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 218.848 | 11.994 |  11.338 | 3.381 |  68.199 | 10.197 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 230.242 |  0.186 |   0.967 | 0.288 |  68.488 |  0.000 |        |       |                |  [1,14,14,256] |          |                  _plus10 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 231.214 |  7.813 |  11.092 | 3.308 |  71.795 | 10.422 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 242.403 |  7.547 |  11.386 | 3.395 |  75.191 | 10.154 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 253.854 |  0.100 |   0.739 | 0.220 |  75.411 |  0.000 |        |       |                |  [1,14,14,256] |          |                  _plus11 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 254.598 | 12.812 |  10.964 | 3.270 |  78.681 | 10.544 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 265.612 | 12.974 |  11.197 | 3.339 |  82.020 | 10.325 |  [1,1] | [2,2] |  [256,3,3,256] |  [1,14,14,256] |          |       stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 276.861 |  0.186 |   0.688 | 0.205 |  82.225 |  0.000 |        |       |                |  [1,14,14,256] |          |                  _plus12 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 277.553 |  5.283 |   5.751 | 1.715 |  83.940 | 10.052 |  [2,2] | [2,2] |  [512,3,3,256] |    [1,7,7,512] |          |       stage4_unit1_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 283.370 |  7.992 |   9.351 | 2.789 |  86.729 | 12.363 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 292.773 |  3.973 |   2.003 | 0.597 |  87.326 |  3.207 |  [2,2] | [0,0] |  [512,1,1,256] |    [1,7,7,512] |          |      stage4_unit1_screlu |
I mace/benchmark/statistics.cc:347] |         Eltwise | 294.827 |  0.054 |   0.153 | 0.046 |  87.372 |  0.000 |        |       |                |    [1,7,7,512] |          |                  _plus13 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 294.984 |  9.678 |   8.054 | 2.402 |  89.773 | 14.354 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit2_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 303.089 | 10.430 |   9.298 | 2.773 |  92.546 | 12.434 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 312.440 |  3.590 |   1.110 | 0.331 |  92.877 |  0.000 |        |       |                |    [1,7,7,512] |          |                  _plus14 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 313.556 |  8.455 |   8.992 | 2.682 |  95.559 | 12.856 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 322.600 | 12.810 |   9.267 | 2.763 |  98.322 | 12.476 |  [1,1] | [2,2] |  [512,3,3,512] |    [1,7,7,512] |          |       stage4_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |         Eltwise | 331.921 |  0.827 |   0.867 | 0.259 |  98.581 |  0.000 |        |       |                |    [1,7,7,512] |          |                  _plus15 |
I mace/benchmark/statistics.cc:347] |          Conv2D | 332.793 |  4.178 |   3.363 | 1.003 |  99.584 |  7.640 |  [1,1] | [0,0] | [1024,1,1,512] |   [1,7,7,1024] |          |                   convx1 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d | 336.203 |  1.864 |   0.697 | 0.208 |  99.792 | 10.535 |  [1,1] | [0,0] |   [7,7,1024,1] |   [1,1,1,1024] |          |       conv_6dw7_7_conv2d |
I mace/benchmark/statistics.cc:347] |          Conv2D | 336.925 |  0.536 |   0.664 | 0.198 |  99.990 |  0.789 |  [1,1] | [0,0] | [512,1,1,1024] |    [1,1,1,512] |          | mace_output_node_pre_fc1 |
I mace/benchmark/statistics.cc:347] |      Dequantize | 337.626 |  0.074 |   0.035 | 0.010 | 100.000 |  0.000 |        |       |                |    [1,1,1,512] |          |                    fc1bn |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                                              Sort by Computation Time
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | Op Type |   Start |  First | Avg(ms) |     % |   cdf% | GMACPS | Stride |   Pad |  Filter Shape |  Output Shape | Dilation |               name |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |  Conv2D | 146.166 | 10.927 |  12.394 | 3.696 |  3.696 |  9.327 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit1_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 195.071 | 13.797 |  11.694 | 3.487 |  7.184 |  9.886 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit3_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 171.745 | 11.628 |  11.592 | 3.457 | 10.641 |  9.973 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit2_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 242.403 |  7.547 |  11.386 | 3.395 | 14.036 | 10.154 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit5_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 218.848 | 11.994 |  11.338 | 3.381 | 17.417 | 10.197 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit4_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 265.612 | 12.974 |  11.197 | 3.339 | 20.756 | 10.325 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit6_relu2 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 207.691 | 12.616 |  11.099 | 3.310 | 24.066 | 10.416 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit4_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 231.214 |  7.813 |  11.092 | 3.308 | 27.374 | 10.422 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit5_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 184.009 | 11.539 |  10.990 | 3.278 | 30.651 | 10.519 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit3_relu1 |
I mace/benchmark/statistics.cc:347] |  Conv2D | 254.598 | 12.812 |  10.964 | 3.270 | 33.921 | 10.544 |  [1,1] | [2,2] | [256,3,3,256] | [1,14,14,256] |          | stage3_unit6_relu1 |
I mace/benchmark/statistics.cc:347] ---------------------------------------------------------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                                         Stat by Op Type
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         Op Type | Count | Avg(ms) |      % |    cdf% |          MACs | GMACPS | Called times |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |          Conv2D |    39 | 320.382 | 95.553 |  95.553 | 3,605,970,944 | 11.255 |           39 |
I mace/benchmark/statistics.cc:347] |         Eltwise |    16 |  14.017 |  4.181 |  99.733 |             0 |  0.000 |           16 |
I mace/benchmark/statistics.cc:347] | DepthwiseConv2d |     1 |   0.696 |  0.208 |  99.941 |     7,340,032 | 10.546 |            1 |
I mace/benchmark/statistics.cc:347] |        Quantize |     1 |   0.165 |  0.049 |  99.990 |             0 |  0.000 |            1 |
I mace/benchmark/statistics.cc:347] |      Dequantize |     1 |   0.034 |  0.010 | 100.000 |             0 |  0.000 |            1 |
I mace/benchmark/statistics.cc:347] ------------------------------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347]             Stat by MACs(Multiply-Accumulation)
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |         total | round | first(G/s) | avg(G/s) |       std |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | 3,613,310,976 |    30 |      9.993 |   10.776 | 12957.633 |
I mace/benchmark/statistics.cc:347] -------------------------------------------------------------
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347]                            Summary of Ops' Stat
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] |    30 |   361.590 |  328.352 | 307.066 | 361.590 | 335.323 | 12957.633 |
I mace/benchmark/statistics.cc:347] --------------------------------------------------------------------------
I mace/benchmark/statistics.cc:347] 
I mace/benchmark/statistics.cc:347] 58 ops total.
I mace/libmace/mace.cc:603] Destroying MaceEngine

It looks like many ops are slower when quantized?

I also see a big difference in times provided by benchmark and run for target msmnile, which one do I have to trust?

*************************************************************
          Run model sp on msmnile       
*************************************************************
I mace/tools/validation/mace_run.cc:451] model name: sp
I mace/tools/validation/mace_run.cc:452] mace version: v0.11.0-rc0-0-g2d650b6
I mace/tools/validation/mace_run.cc:453] input node: data
I mace/tools/validation/mace_run.cc:454] input shape: 1,3,112,112
I mace/tools/validation/mace_run.cc:455] output node: fc1bn
I mace/tools/validation/mace_run.cc:456] output shape: 1,1,1,512
I mace/tools/validation/mace_run.cc:457] input_file: /data/local/tmp/mace_run/model_input
I mace/tools/validation/mace_run.cc:458] output_file: /data/local/tmp/mace_run/model_out
I mace/tools/validation/mace_run.cc:459] model_data_file:
I mace/tools/validation/mace_run.cc:460] model_file:
I mace/tools/validation/mace_run.cc:461] device: CPU
I mace/tools/validation/mace_run.cc:462] round: 100
I mace/tools/validation/mace_run.cc:463] restart_round: 1
I mace/tools/validation/mace_run.cc:464] gpu_perf_hint: 3
I mace/tools/validation/mace_run.cc:465] gpu_priority_hint: 3
I mace/tools/validation/mace_run.cc:466] omp_num_threads: -1
I mace/tools/validation/mace_run.cc:467] cpu_affinity_policy: 1
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/libmace/mace.cc:603] Destroying MaceEngine
I mace/tools/validation/mace_run.cc:508] restart round 0
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/tools/validation/mace_run.cc:265] Create Mace Engine latency: 6.535 ms
I mace/tools/validation/mace_run.cc:272] Total init latency: 6.622 ms
I mace/tools/validation/mace_run.cc:313] Warm up run
I mace/tools/validation/mace_run.cc:349] 1st warm up run latency: 227.403 ms
I mace/tools/validation/mace_run.cc:356] Run model
I mace/tools/validation/mace_run.cc:407] Average latency: 92.6602 ms
========================================================
     capability(CPU)        init      warmup     run_avg
========================================================
time          18.459       6.622     227.403      92.660
I mace/tools/validation/mace_run.cc:430] Write output file /data/local/tmp/mace_run/model_out_fc1bn with size 2048 done.
I mace/libmace/mace.cc:603] Destroying MaceEngine
Running finished!
*************************************************************
          Run model sp on msmnile          
*************************************************************
I mace/tools/validation/mace_run.cc:451] model name: sp
I mace/tools/validation/mace_run.cc:452] mace version: v0.11.0-rc0-0-g2d650b6
I mace/tools/validation/mace_run.cc:453] input node: data
I mace/tools/validation/mace_run.cc:454] input shape: 1,3,112,112
I mace/tools/validation/mace_run.cc:455] output node: fc1bn
I mace/tools/validation/mace_run.cc:456] output shape: 1,1,1,512
I mace/tools/validation/mace_run.cc:457] input_file: /data/local/tmp/mace_run/model_input
I mace/tools/validation/mace_run.cc:458] output_file: /data/local/tmp/mace_run/model_out
I mace/tools/validation/mace_run.cc:459] model_data_file:
I mace/tools/validation/mace_run.cc:460] model_file:
I mace/tools/validation/mace_run.cc:461] device: CPU
I mace/tools/validation/mace_run.cc:462] round: 100
I mace/tools/validation/mace_run.cc:463] restart_round: 1
I mace/tools/validation/mace_run.cc:464] gpu_perf_hint: 3
I mace/tools/validation/mace_run.cc:465] gpu_priority_hint: 3
I mace/tools/validation/mace_run.cc:466] omp_num_threads: -1
I mace/tools/validation/mace_run.cc:467] cpu_affinity_policy: 1
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/libmace/mace.cc:603] Destroying MaceEngine
I mace/tools/validation/mace_run.cc:508] restart round 0
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/tools/validation/mace_run.cc:265] Create Mace Engine latency: 5.385 ms
I mace/tools/validation/mace_run.cc:272] Total init latency: 5.442 ms
I mace/tools/validation/mace_run.cc:313] Warm up run
I mace/tools/validation/mace_run.cc:349] 1st warm up run latency: 72.696 ms
I mace/tools/validation/mace_run.cc:356] Run model
I mace/tools/validation/mace_run.cc:407] Average latency: 60.1606 ms
========================================================
     capability(CPU)        init      warmup     run_avg
========================================================
time          18.838       5.442      72.696      60.161
I mace/tools/validation/mace_run.cc:430] Write output file /data/local/tmp/mace_run/model_out_fc1bn with size 2048 done.
I mace/libmace/mace.cc:603] Destroying MaceEngine
Running finished!
*************************************************************
          Run model sp on qcs605          
*************************************************************
I mace/tools/validation/mace_run.cc:451] model name: sp
I mace/tools/validation/mace_run.cc:452] mace version: v0.11.0-rc0-0-g2d650b6
I mace/tools/validation/mace_run.cc:453] input node: data
I mace/tools/validation/mace_run.cc:454] input shape: 1,3,112,112
I mace/tools/validation/mace_run.cc:455] output node: fc1bn
I mace/tools/validation/mace_run.cc:456] output shape: 1,1,1,512
I mace/tools/validation/mace_run.cc:457] input_file: /data/local/tmp/mace_run/model_input
I mace/tools/validation/mace_run.cc:458] output_file: /data/local/tmp/mace_run/model_out
I mace/tools/validation/mace_run.cc:459] model_data_file:
I mace/tools/validation/mace_run.cc:460] model_file:
I mace/tools/validation/mace_run.cc:461] device: CPU
I mace/tools/validation/mace_run.cc:462] round: 100
I mace/tools/validation/mace_run.cc:463] restart_round: 1
I mace/tools/validation/mace_run.cc:464] gpu_perf_hint: 3
I mace/tools/validation/mace_run.cc:465] gpu_priority_hint: 3
I mace/tools/validation/mace_run.cc:466] omp_num_threads: -1
I mace/tools/validation/mace_run.cc:467] cpu_affinity_policy: 1
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/libmace/mace.cc:603] Destroying MaceEngine
I mace/tools/validation/mace_run.cc:508] restart round 0
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/tools/validation/mace_run.cc:265] Create Mace Engine latency: 28.322 ms
I mace/tools/validation/mace_run.cc:272] Total init latency: 28.615 ms
I mace/tools/validation/mace_run.cc:313] Warm up run
I mace/tools/validation/mace_run.cc:349] 1st warm up run latency: 1564.62 ms
I mace/tools/validation/mace_run.cc:356] Run model
I mace/tools/validation/mace_run.cc:407] Average latency: 339.838 ms
========================================================
     capability(CPU)        init      warmup     run_avg
========================================================
time          34.125      28.615    1564.618     339.838
I mace/tools/validation/mace_run.cc:430] Write output file /data/local/tmp/mace_run/model_out_fc1bn with size 2048 done.
I mace/libmace/mace.cc:603] Destroying MaceEngine
Running finished!
*************************************************************
          Run model sp on qcs605          
*************************************************************
I mace/tools/validation/mace_run.cc:451] model name: sp
I mace/tools/validation/mace_run.cc:452] mace version: v0.11.0-rc0-0-g2d650b6
I mace/tools/validation/mace_run.cc:453] input node: data
I mace/tools/validation/mace_run.cc:454] input shape: 1,3,112,112
I mace/tools/validation/mace_run.cc:455] output node: fc1bn
I mace/tools/validation/mace_run.cc:456] output shape: 1,1,1,512
I mace/tools/validation/mace_run.cc:457] input_file: /data/local/tmp/mace_run/model_input
I mace/tools/validation/mace_run.cc:458] output_file: /data/local/tmp/mace_run/model_out
I mace/tools/validation/mace_run.cc:459] model_data_file:
I mace/tools/validation/mace_run.cc:460] model_file:
I mace/tools/validation/mace_run.cc:461] device: CPU
I mace/tools/validation/mace_run.cc:462] round: 100
I mace/tools/validation/mace_run.cc:463] restart_round: 1
I mace/tools/validation/mace_run.cc:464] gpu_perf_hint: 3
I mace/tools/validation/mace_run.cc:465] gpu_priority_hint: 3
I mace/tools/validation/mace_run.cc:466] omp_num_threads: -1
I mace/tools/validation/mace_run.cc:467] cpu_affinity_policy: 1
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/libmace/mace.cc:603] Destroying MaceEngine
I mace/tools/validation/mace_run.cc:508] restart round 0
I mace/libmace/mace.cc:431] Creating MaceEngine, MACE version: v0.11.0-rc0-0-g2d650b6
I mace/libmace/mace.cc:470] Initializing MaceEngine
I mace/tools/validation/mace_run.cc:265] Create Mace Engine latency: 31.947 ms
I mace/tools/validation/mace_run.cc:272] Total init latency: 32.216 ms
I mace/tools/validation/mace_run.cc:313] Warm up run
I mace/tools/validation/mace_run.cc:349] 1st warm up run latency: 464.248 ms
I mace/tools/validation/mace_run.cc:356] Run model
I mace/tools/validation/mace_run.cc:407] Average latency: 337.925 ms
========================================================
     capability(CPU)        init      warmup     run_avg
========================================================
time          35.085      32.216     464.248     337.925
I mace/tools/validation/mace_run.cc:430] Write output file /data/local/tmp/mace_run/model_out_fc1bn with size 2048 done.
I mace/libmace/mace.cc:603] Destroying MaceEngine
Running finished!