airockchip / rknn-toolkit2

Other
929 stars 96 forks source link

Problem in conversion of a STGCN++ .onnx model to .rknn model for RK3588 #57

Open musicgary opened 5 months ago

musicgary commented 5 months ago

Dear all,

I have an .onnx model from STGCN++ model. Using the rknn-toolkit 2 verson 2.0b, when I wanted to convert the model to .rknn for deployment in RK3588, I encountered the following problem: (opset 17)

(rknn2b0) $ python convert_2.py newAction17_cut_9.onnx rk3588 fp newAction17.rknn
I rknn-toolkit2 version: 2.0.0b0+9bab5682
--> Config model
done
--> Loading model
I It is recommended onnx opset 19, but your onnx model opset is 17!
I Model converted from pytorch, 'opset_version' should be set 19 in torch.onnx.export for successful convert!
I Loading : 100%|█████████████████████████████████████████████| 444/444 [00:00<00:00, 265016.50it/s]
W load_onnx: The config.mean_values is None, zeros will be set for input 0!
W load_onnx: The config.std_values is None, ones will be set for input 0!
done
--> Building model
D base_optimize ...
D base_optimize done.
D 
D fold_constant ...
D fold_constant done.
D fold_constant remove nodes = ['/cls_head/Concat_1', '/cls_head/Unsqueeze_6', '/cls_head/Unsqueeze_5', '/cls_head/Unsqueeze_4', '/cls_head/Concat', '/cls_head/Unsqueeze_3', '/cls_head/Unsqueeze_2', '/cls_head/Unsqueeze_1', '/cls_head/Unsqueeze', '/cls_head/Mul', '/cls_head/Gather_4', '/cls_head/Shape_4', '/cls_head/Gather_3', '/cls_head/Shape_3', '/cls_head/Gather_2', '/cls_head/Shape_2', '/cls_head/Gather_1', '/cls_head/Shape_1', '/cls_head/Gather', '/cls_head/Shape', '/backbone/Concat', '/backbone/Unsqueeze_2', '/backbone/Unsqueeze_1', '/backbone/Unsqueeze', '/backbone/Gather_2', '/backbone/Shape_2', '/backbone/Gather_1', '/backbone/Shape_1', '/backbone/Gather', '/backbone/Shape', '/backbone/gcn.9/gcn/Concat', '/backbone/gcn.9/gcn/Unsqueeze_2', '/backbone/gcn.9/gcn/Unsqueeze_1', '/backbone/gcn.9/gcn/Unsqueeze', '/backbone/gcn.9/gcn/Gather_2', '/backbone/gcn.9/gcn/Shape_2', '/backbone/gcn.9/gcn/Gather_1', '/backbone/gcn.9/gcn/Shape_1', '/backbone/gcn.9/gcn/Gather', '/backbone/gcn.9/gcn/Shape', '/backbone/gcn.8/gcn/Concat', '/backbone/gcn.8/gcn/Unsqueeze_2', '/backbone/gcn.8/gcn/Unsqueeze_1', '/backbone/gcn.8/gcn/Unsqueeze', '/backbone/gcn.8/gcn/Gather_2', '/backbone/gcn.8/gcn/Shape_2', '/backbone/gcn.8/gcn/Gather_1', '/backbone/gcn.8/gcn/Shape_1', '/backbone/gcn.8/gcn/Gather', '/backbone/gcn.8/gcn/Shape', '/backbone/gcn.7/gcn/Concat', '/backbone/gcn.7/gcn/Unsqueeze_2', '/backbone/gcn.7/gcn/Unsqueeze_1', '/backbone/gcn.7/gcn/Unsqueeze', '/backbone/gcn.7/gcn/Gather_2', '/backbone/gcn.7/gcn/Shape_2', '/backbone/gcn.7/gcn/Gather_1', '/backbone/gcn.7/gcn/Shape_1', '/backbone/gcn.7/gcn/Gather', '/backbone/gcn.7/gcn/Shape', '/backbone/gcn.6/gcn/Concat', '/backbone/gcn.6/gcn/Unsqueeze_2', '/backbone/gcn.6/gcn/Unsqueeze_1', '/backbone/gcn.6/gcn/Unsqueeze', '/backbone/gcn.6/gcn/Gather_2', '/backbone/gcn.6/gcn/Shape_2', '/backbone/gcn.6/gcn/Gather_1', '/backbone/gcn.6/gcn/Shape_1', '/backbone/gcn.6/gcn/Gather', '/backbone/gcn.6/gcn/Shape', '/backbone/gcn.5/gcn/Concat', '/backbone/gcn.5/gcn/Unsqueeze_2', '/backbone/gcn.5/gcn/Unsqueeze_1', '/backbone/gcn.5/gcn/Unsqueeze', '/backbone/gcn.5/gcn/Gather_2', '/backbone/gcn.5/gcn/Shape_2', '/backbone/gcn.5/gcn/Gather_1', '/backbone/gcn.5/gcn/Shape_1', '/backbone/gcn.5/gcn/Gather', '/backbone/gcn.5/gcn/Shape', '/backbone/gcn.4/gcn/Concat', '/backbone/gcn.4/gcn/Unsqueeze_2', '/backbone/gcn.4/gcn/Unsqueeze_1', '/backbone/gcn.4/gcn/Unsqueeze', '/backbone/gcn.4/gcn/Gather_2', '/backbone/gcn.4/gcn/Shape_2', '/backbone/gcn.4/gcn/Gather_1', '/backbone/gcn.4/gcn/Shape_1', '/backbone/gcn.4/gcn/Gather', '/backbone/gcn.4/gcn/Shape', '/backbone/gcn.3/gcn/Concat', '/backbone/gcn.3/gcn/Unsqueeze_2', '/backbone/gcn.3/gcn/Unsqueeze_1', '/backbone/gcn.3/gcn/Unsqueeze', '/backbone/gcn.3/gcn/Gather_2', '/backbone/gcn.3/gcn/Shape_2', '/backbone/gcn.3/gcn/Gather_1', '/backbone/gcn.3/gcn/Shape_1', '/backbone/gcn.3/gcn/Gather', '/backbone/gcn.3/gcn/Shape', '/backbone/gcn.2/gcn/Concat', '/backbone/gcn.2/gcn/Unsqueeze_2', '/backbone/gcn.2/gcn/Unsqueeze_1', '/backbone/gcn.2/gcn/Unsqueeze', '/backbone/gcn.2/gcn/Gather_2', '/backbone/gcn.2/gcn/Shape_2', '/backbone/gcn.2/gcn/Gather_1', '/backbone/gcn.2/gcn/Shape_1', '/backbone/gcn.2/gcn/Gather', '/backbone/gcn.2/gcn/Shape', '/backbone/gcn.1/gcn/Concat', '/backbone/gcn.1/gcn/Unsqueeze_2', '/backbone/gcn.1/gcn/Unsqueeze_1', '/backbone/gcn.1/gcn/Unsqueeze', '/backbone/gcn.1/gcn/Gather_2', '/backbone/gcn.1/gcn/Shape_2', '/backbone/gcn.1/gcn/Gather_1', '/backbone/gcn.1/gcn/Shape_1', '/backbone/gcn.1/gcn/Gather', '/backbone/gcn.1/gcn/Shape']
D 
D correct_ops ...
D correct_ops done.
D 
D fuse_ops ...
D fuse_ops results:
D     remove_invalid_add: remove node = ['/backbone/gcn.0/Add']
D     bypass_two_reshape: remove node = ['/cls_head/Reshape', '/backbone/Reshape_3']
D     swap_reshape_softmax: remove node = ['/Reshape_1', '/Softmax'], add node = ['/Softmax', '/Reshape_1']
D     unsqueeze_to_4d_bn: remove node = [], add node = ['/backbone/data_bn/BatchNormalization_0_unsqueeze0', '/backbone/data_bn/BatchNormalization_0_unsqueeze1']
D     squeeze_1_in_nd_transpose: remove node = [], add node = ['/backbone/Reshape_1_output_0_squeeze_1_nd_/backbone/Transpose_1', '/backbone/Transpose_1_output_0_squeeze_1_nd_/backbone/Transpose_1']
E build: Catch exception when building RKNN model!
E build: Traceback (most recent call last):
E build:   File "rknn/api/rknn_base.py", line 1993, in rknn.api.rknn_base.RKNNBase.build
E build:   File "rknn/api/graph_optimizer.py", line 1907, in rknn.api.graph_optimizer.GraphOptimizer.fuse_ops
E build:   File "rknn/api/fuse_rules.py", line 14960, in rknn.api.fuse_rules._p_convert_einsum_to_exmatmul
E build: KeyError: 'n'
W If you can't handle this error, please try updating to the latest version of the toolkit2 and runtime from:
  https://console.zbox.filez.com/l/I00fc3 (Pwd: rknn)  Path: RKNPU2_SDK / 2.X.X / develop /
  If the error still exists in the latest version, please collect the corresponding error logs and the model,
  convert script, and input data that can reproduce the problem, and then submit an issue on:
  https://redmine.rock-chips.com (Please consult our sales or FAE for the redmine account)
Build model failed!

I read from the error log and tried to convert using opset 19, still no luck: The error log is as follows:

E load_onnx: Catch exception when loading onnx model: /mnt/nvme0n1p2/home/gary/Documents/demo_8fps_2s_v2.1/newClassifier19.onnx!
E load_onnx: Traceback (most recent call last):
E load_onnx:   File "rknn/api/rknn_base.py", line 1546, in rknn.api.rknn_base.RKNNBase.load_onnx
E load_onnx:   File "rknn/api/rknn_base.py", line 674, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta
E load_onnx:   File "rknn/api/ir_graph.py", line 70, in rknn.api.ir_graph.IRGraph.__init__
E load_onnx:   File "rknn/api/ir_graph.py", line 555, in rknn.api.ir_graph.IRGraph.rebuild
E load_onnx:   File "/home/gary/anaconda3/envs/rknn2b0/lib/python3.8/site-packages/onnx/checker.py", line 136, in check_model
E load_onnx:     C.check_model(protobuf_string, full_check)
E load_onnx: onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: axes for operator ReduceMean
E load_onnx: ==> Context: Bad node spec for node. Name: /cls_head/ReduceMean OpType: ReduceMean
W If you can't handle this error, please try updating to the latest version of the toolkit2 and runtime from:
  https://console.zbox.filez.com/l/I00fc3 (Pwd: rknn)  Path: RKNPU2_SDK / 2.X.X / develop /
  If the error still exists in the latest version, please collect the corresponding error logs and the model,
  convert script, and input data that can reproduce the problem, and then submit an issue on:
  https://redmine.rock-chips.com (Please consult our sales or FAE for the redmine account)
Load model failed!

Does it mean that I have to exclude the einsum operator or ReduceMean operator in deployment in RK3588? Is there any workaround for that? Thanks very much for your help.

yuyun2000 commented 5 months ago

看起来(convert_einsum_to_exmatmul)是自动优化einsum时出了错误,实际上在2.0版本之前他甚至不支持爱因斯坦求和算子,所以我猜测大概率还是对这个算子的支持度不高导致的,我建议自己将这个算子等效转换为其他的矩阵操作,这很好实现

tylertroy commented 1 month ago

I have raised the same error when converting the flowformer and flowformer++ ONNX models to RKNN. I include the error message below in case it helps diagnosis of the error.

See my error output below.

D     unsqueeze_to_4d_add: remove node = [], add node = ['/memory_encoder/feat_encoder/blocks.1.1_1/Add_output_0_rs', '/memory_encoder/feat_encoder/blocks.1.1/mlp/fc2_1/Add_output_0_rs', '/memory_encoder/feat_encoder/blocks.1.1_1/Add_1_output_0-rs']
W build: Show op fuse match nodes:
Ruler: convert_einsum_to_exmatmul
Subgraph:
Op type: Einsum
Op name: /memory_encoder/Einsum
  Input:
    /memory_encoder/Reshape_1_output_0 : [1, 1, 1980, 256]
    /memory_encoder/Reshape_3_output_0 : [1, 1, 1980, 256]
  Output:
    /memory_encoder/Einsum_output_0 : [1, 1, 1980, 1980]
  Attribute:
    equation : bhid, bhjd -> bhij

E build: Traceback (most recent call last):
  File "rknn/api/rknn_log.py", line 309, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
  File "rknn/api/rknn_base.py", line 1901, in rknn.api.rknn_base.RKNNBase.build
  File "rknn/api/graph_optimizer.py", line 2069, in rknn.api.graph_optimizer.GraphOptimizer.fuse_ops
  File "rknn/api/rules/matmul.py", line 2130, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul
  File "rknn/api/rules/matmul.py", line 2122, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul._get_dim_type
IndexError: list index out of range

W build: ===================== WARN(2) =====================
E rknn-toolkit2 version: 2.0.0b23+29ceb58d
Traceback (most recent call last):
  File "rknn/api/rknn_log.py", line 309, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
  File "rknn/api/rknn_base.py", line 1901, in rknn.api.rknn_base.RKNNBase.build
  File "rknn/api/graph_optimizer.py", line 2069, in rknn.api.graph_optimizer.GraphOptimizer.fuse_ops
  File "rknn/api/rules/matmul.py", line 2130, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul
  File "rknn/api/rules/matmul.py", line 2122, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul._get_dim_type
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/████/█████/████/████/rknn-toolkit2/rknn-toolkit2/examples/onnx/flowformer/convert.py", line 40, in <module>
    ret = rknn.build(do_quantization=DO_QUANTIZATION, dataset='./dataset.txt')
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "████/█████/████/████/████/lib/python3.11/site-packages/rknn/api/rknn.py", line 204, in build
    return self.rknn_base.build(do_quantization=do_quantization, dataset=dataset, expand_batch_size=rknn_batch_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rknn/api/rknn_log.py", line 314, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
  File "rknn/api/rknn_log.py", line 95, in rknn.api.rknn_log.RKNNLog.e
ValueError: Traceback (most recent call last):
  File "rknn/api/rknn_log.py", line 309, in rknn.api.rknn_log.error_catch_decorator.error_catch_wrapper
  File "rknn/api/rknn_base.py", line 1901, in rknn.api.rknn_base.RKNNBase.build
  File "rknn/api/graph_optimizer.py", line 2069, in rknn.api.graph_optimizer.GraphOptimizer.fuse_ops
  File "rknn/api/rules/matmul.py", line 2130, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul
  File "rknn/api/rules/matmul.py", line 2122, in rknn.api.rules.matmul._p_convert_einsum_to_exmatmul._get_dim_type
IndexError: list index out of range