TVM RPC Error "PLS isn't existed" on Khadas VIM3 Pro (Amlogic A311D)

@sunshinemyson

I tried the VSI NPU as TVM target, ran the test_operations.py in TVM_FOLDER/tests/python/contrib/test_vsi_npu. It had error "PLS isn't existed" on VIM3 Pro side. I found the previous issue , I cannot solve the problem by setting "VSIMULATOR_CONFIG=VIPNANOQI_PID0X88". The following is my environment:

Environment variable (Host)

export VSIMULATOR_CONFIG=VIPNANOQI_PID0X88 # This PID is provided by khadas document.
export VIV_VX_DEBUG_LEVEL=1

Environment variable (VIM3 Pro)

export VIV_VX_DEBUG_LEVEL=1

Model: Khadas VIM3 Pro
SoC: Amlogic A311D with 5 TOPS Performance NPU
OS information:

Read More

```console khadas@Khadas:~$ uname -a Linux Khadas 4.9.241 #18 SMP PREEMPT Fri Jun 25 14:18:34 CST 2021 aarch64 aarch64 aarch64 GNU/Linux ``` ```console khadas@Khadas:~$ cat /etc/fenix-release # PLEASE DO NOT EDIT THIS FILE BOARD=VIM3 VENDOR=Amlogic VERSION=1.0.7 ARCH=arm64 INITRD_ARCH=arm64 INSTALL_TYPE=EMMC IMAGE_VERSION=V1.0.7-210625 ################ GIT VERSION ################ UBOOT_GIT_VERSION=khadas-vims-v1.0.5-release LINUX_GIT_VERSION=khadas-vims-v1.0.5-release-6-gc5aa6ab FENIX_GIT_VERSION=v1.0.7 ############################################# ```
NPU information:

Read More

```console khadas@Khadas:~$ dpkg -l | grep npu ii aml-npu 6.4.4.3AAA-2 arm64 Amlogic NPU libraries. ii evtest 1:1.34-1 arm64 utility to monitor Linux input device events ii libinput-bin 1.15.5-1ubuntu0.2 arm64 input device management and event handling library - udev quirks ii libinput10:arm64 1.15.5-1ubuntu0.2 arm64 input device management and event handling library - shared library ii libxi6:arm64 2:1.7.10-0ubuntu1 arm64 X11 Input extension library ``` ```console khadas@Khadas:~$ lsmod Module Size Used by cpufreq_powersave 16384 0 cpufreq_userspace 16384 0 cpufreq_conservative 16384 0 cpufreq_ondemand 20480 0 iv009_isp_sensor 270336 0 iv009_isp_lens 69632 0 iv009_isp_iq 544768 0 galcore 462848 0 vpu 49152 0 encoder 53248 0 amvdec_avs2 192512 0 amvdec_vp9 151552 0 amvdec_vc1 53248 0 amvdec_real 40960 0 amvdec_mmpeg4 32768 0 amvdec_mpeg4 53248 0 amvdec_mmpeg12 40960 0 amvdec_mpeg12 90112 0 amvdec_mmjpeg 28672 0 amvdec_mjpeg 36864 0 amvdec_h265 135168 0 amvdec_h264mvc 49152 0 amvdec_mh264 151552 0 amvdec_h264 118784 0 amvdec_avs 61440 0 stream_input 180224 10 amvdec_h265,amvdec_mh264,amvdec_h264mvc,amvdec_real,amvdec_vp9,amvdec_h264,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_mmpeg12 decoder_common 176128 17 amvdec_h265,amvdec_mjpeg,amvdec_mh264,amvdec_mmpeg4,amvdec_h264mvc,amvdec_mmjpeg,amvdec_real,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_vc1,amvdec_mmpeg12,amvdec_mpeg4 firmware 28672 18 amvdec_h265,amvdec_mjpeg,amvdec_mh264,amvdec_mmpeg4,amvdec_h264mvc,amvdec_mmjpeg,decoder_common,amvdec_real,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_vc1,amvdec_mmpeg12,amvdec_mpeg4 media_clock 45056 12 amvdec_h265,amvdec_mh264,decoder_common,vpu,firmware,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs mali_kbase 475136 0 iv009_isp 540672 2 zram 36864 4 dhd 1404928 0 btrfs 1269760 0 xor 20480 1 btrfs raid6_pq 106496 1 btrfs ``` ```console khadas@Khadas:~$ ls /dev/galcore /dev/galcore ``` ```console khadas@Khadas:~$ sudo dmesg | grep Gal [ 12.202405] Galcore version 6.4.4.3.310723AAA ```

TIM-VX Version:1.1.32

TVM Branch commit id: b822ec32702e2676dce1e430221e8efc05c98935

The output message after executing Unittest program of TIM-VX:

```console khadas@Khadas:~/TIM-VX-1.1.32/install/bin$ ./unit_test Running main() from /home/khadas/TIM-VX-1.1.32/_deps/googletest-src/googletest/src/gtest_main.cc [==========] Running 104 tests from 33 test suites. [----------] Global test environment set-up. [----------] 1 test from Context [----------] 1 test from Context (25 ms total) [----------] 2 tests from graph [ RUN ] graph.gen_binary_graph_with_empty_graph E [_graph_optimization_convert_int8_to_uint8:792]CHECK STATUS(-1:A generic error code, used when no other describes the error.) E [vsi_nn_OptimizeGraph:827]CHECK STATUS(-1:A generic error code, used when no other describes the error.) [ OK ] graph.gen_binary_graph_with_empty_graph (3 ms) [ RUN ] graph.gen_binary_graph_with_simple_add [ OK ] graph.gen_binary_graph_with_simple_add (8 ms) [----------] 2 tests from graph (11 ms total) [----------] 2 tests from Linear [----------] 2 tests from Linear (13 ms total) [----------] 3 tests from Conv1d [----------] 3 tests from Conv1d (22 ms total) [----------] 19 tests from Conv2d [----------] 19 tests from Conv2d (195 ms total) [----------] 2 tests from DeConv1d [ RUN ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv1d_test.cc:69: Failure Expected equality of these values: golden Which is: { 27, 81, 30, 9, 3, 21, 15, 27, 0, 0 } output_data Which is: { 48, 96, 57, 9, 3, 0, 0, 0, 0, 0 } Result mismatch [ FAILED ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 (9 ms) [----------] 2 tests from DeConv1d (56 ms total) [----------] 2 tests from DeConv2d [ RUN ] DeConv2d.shape_3_3_2_1_float_depthwise /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv2d_test.cc:85: Failure Expected equality of these values: golden Which is: { 27, 72, 18, 24, 3, 81, 45, 90, 15, 21, 30, 26, 43, 22, 11, 9, 5, 25, 10, 14, 3, 2, 9, 4, 6, 21, 27, 52, 63, 7, 15, 6, ... } output_data Which is: { 48, 99, 70, 87, 10, 96, 51, 134, 29, 42, 57, 26, 168, 94, 33, 9, 5, 65, 26, 38, 3, 2, 81, 4, 22, 0, 0, 0, 0, 0, 0, 0, ... } Result mismatch [ FAILED ] DeConv2d.shape_3_3_2_1_float_depthwise (9 ms) [----------] 2 tests from DeConv2d (18 ms total) [----------] 16 tests from DepthwiseConv [----------] 16 tests from DepthwiseConv (176 ms total) [----------] 3 tests from FloorDiv (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (257:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.floordiv_U8U8toU8_2D fail with -1. [ OK ] FloorDiv.shape_5_1_broadcast_uint8 (56 ms) [----------] 3 tests from FloorDiv (135 ms total) [----------] 3 tests from GroupedConv2d [----------] 3 tests from GroupedConv2d (29 ms total) [----------] 2 tests from InstanceNorm [----------] 2 tests from InstanceNorm (208 ms total) [----------] 2 tests from LayerNorm [----------] 2 tests from LayerNorm (117 ms total) [----------] 3 tests from LogSoftmax [ RUN ] LogSoftmax.shape_3_6_1_uint8_axis_1 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (263:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.log_softmax_axis1_U8toU8_2D fail with -1. [ OK ] LogSoftmax.shape_3_6_1_uint8_axis_1 (70 ms) [----------] 3 tests from LogSoftmax (161 ms total) [----------] 3 tests from Matmul [ RUN ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (261:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.gemm_transa_U8U8toU8 fail with -1. [ OK ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a (30 ms) [----------] 3 tests from Matmul (113 ms total) [----------] 2 tests from MaxpoolWithArgmax [ RUN ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (258:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.poolwithargmax_U8to_U8_U8_2D fail with -1. [ OK ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 (54 ms) [----------] 2 tests from MaxpoolWithArgmax (100 ms total) [----------] 2 tests from MaxUnpool2d [ RUN ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (256:0) : error : Error(0,256) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (296:0) : error : undefined identifier: 'vxc_uchar8' (296:0) : error : undefined identifier: 'vxc_uchar8' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'vxc_uchar16' (296:0) : error : undefined identifier: 'din' (296:0) : error : undefined identifier: 'axisIn' (296:0) : error : undefined identifier: 'dinExpand' (296:0) : error : undefined identifier: 'axisInExpand' (296:0) : error : undefined identifier: 'zpValue' (296:0) : error : undefined identifier: 'constAxis' (296:0) : error : undefined identifier: 'axisData' (296:0) : error : undefined identifier: 'dout' (296:0) : error : undefined identifier: 'dout' (296:0) : error : undefined identifier: 'constAxis' (296:0) : error : undefined identifier: 'axisData' (296:0) : error : undefined identifier: 'dout' (296:0) : error : undefined identifier: 'dout' (308:0) : error : undefined identifier: 'vxc_uchar8' (308:0) : error : undefined identifier: 'vxc_uchar8' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'vxc_uchar16' (308:0) : error : undefined identifier: 'din' (308:0) : error : undefined identifier: 'axisIn' (308:0) : error : undefined identifier: 'dinExpand' (308:0) : error : undefined identifier: 'axisInExpand' (308:0) : error : undefined identifier: 'zpValue' (308:0) : error : undefined identifier: 'constAxis' (308:0) : error : undefined identifier: 'axisData' (308:0) : error : undefined identifier: 'dout' (308:0) : error : undefined identifier: 'dout' (308:0) : error : undefined identifier: 'constAxis' (308:0) : error : undefined identifier: 'axisData' (308:0) : error : undefined identifier: 'dout' (308:0) : error : undefined identifier: 'dout' (312:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.upsample_U8_U8to_U8_SAME_2D fail with -1. [ OK ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 (60 ms) [----------] 2 tests from MaxUnpool2d (108 ms total) [----------] 2 tests from Moments [----------] 2 tests from Moments (100 ms total) [----------] 1 test from Equal [ RUN ] Equal.shape_1_uint8 (1:0) : error : Error(0,1) : Cannot find the header file cl_viv_vx_ext.h. (7:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.equal_U8U8toBOOL8_2D fail with -1. [ OK ] Equal.shape_1_uint8 (89 ms) [----------] 1 test from Equal (89 ms total) [----------] 1 test from NotEqual [----------] 1 test from NotEqual (66 ms total) [----------] 1 test from Less [----------] 1 test from Less (64 ms total) [----------] 1 test from GreaterOrEqual [----------] 1 test from GreaterOrEqual (63 ms total) [----------] 1 test from Greater [----------] 1 test from Greater (63 ms total) [----------] 1 test from LessOrEqual [----------] 1 test from LessOrEqual (63 ms total) [----------] 2 tests from Reorg [----------] 2 tests from Reorg (10 ms total) [----------] 3 tests from Resize1d [ RUN ] Resize1d.shape_4_2_1_uint8_nearest_whcn (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (257:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.resize_1d_nearest_U8toU8_op fail with -1. [ OK ] Resize1d.shape_4_2_1_uint8_nearest_whcn (37 ms) [ RUN ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn [ OK ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn (32 ms) [----------] 3 tests from Resize1d (98 ms total) [----------] 2 tests from ScatterND [ RUN ] ScatterND.shape_4_4_4 [ OK ] ScatterND.shape_4_4_4 (41 ms) [ RUN ] ScatterND.shape_9 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (257:0) : error : syntax error at 'VXC_512Bits' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.scatter_nd_U8toU8 fail with -1. [ OK ] ScatterND.shape_9 (25 ms) [----------] 2 tests from ScatterND (66 ms total) [----------] 1 test from Floor [ RUN ] Floor.shape_5_1_fp32 [ OK ] Floor.shape_5_1_fp32 (5 ms) [----------] 1 test from Floor (5 ms total) [----------] 1 test from Cast [ RUN ] Cast.shape_5_1_fp32_to_int32 [ OK ] Cast.shape_5_1_fp32_to_int32 (35 ms) [----------] 1 test from Cast (35 ms total) [----------] 1 test from SpatialTransformer [ RUN ] SpatialTransformer.shape_1_3_3_1_u8 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (23:0) : error : undefined identifier: 'vxc_ushort8' (26:0) : error : undefined identifier: 'src0' (27:0) : error : undefined identifier: 'src1' (29:0) : error : undefined identifier: 'dst' (31:0) : error : undefined identifier: 'dst' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [vsi_nn_RegisterVXKernel:251][/home/khadas/TIM-VX-1.1.32/src/tim/vx/internal/src/libnnext/vsi_nn_vxkernel.c : 251] vxBuildProgram() Error! E [vsi_nn_InitKernel:108]Add parameter 0 to kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail. with -12. E [vsi_nn_InitKernel:121]Finalize kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -12. E [vsi_nn_InitKernel:126]Remove kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -10. E [vsi_nn_RegisterClientKernelAndNewNode:415]Register client kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -10. E [compute_node:379]Create node[0] SPATIAL_TRANSFORMER fail /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/spatial_transformer_test.cc:74: Failure Expected equality of these values: values_golden Which is: { '\x2' (2), '\x3' (3), '\x2' (2), '\x2' (2), '\x3' (3), '\x2' (2), '\x2' (2), '\x3' (3), '\x2' (2) } output_values Which is: { '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' } [ FAILED ] SpatialTransformer.shape_1_3_3_1_u8 (22 ms) [----------] 1 test from SpatialTransformer (22 ms total) [----------] 2 tests from Tile [ RUN ] Tile.shape_3_2_float_multiples_2_1 [ OK ] Tile.shape_3_2_float_multiples_2_1 (45 ms) [ RUN ] Tile.shape_3_2_1_int8_multiples_2_2_1 (1:0) : error : Error(0,1) : Cannot find the header file cl_viv_vx_ext.h. (59:0) : error : undefined identifier: 'vxc_uchar8' (59:0) : error : undefined identifier: 'src' (59:0) : error : undefined identifier: 'src' (59:0) : error : undefined identifier: 'src' (60:0) : error : undefined identifier: 'vxc_uchar8' (60:0) : error : undefined identifier: 'src' (60:0) : error : undefined identifier: 'src' (60:0) : error : undefined identifier: 'src' (61:0) : error : undefined identifier: 'vxc_uchar8' (61:0) : error : undefined identifier: 'src' (61:0) : error : undefined identifier: 'src' (61:0) : error : undefined identifier: 'src' (62:0) : error : undefined identifier: 'vxc_uchar8' (62:0) : error : undefined identifier: 'src' (62:0) : error : undefined identifier: 'src' (62:0) : error : undefined identifier: 'src' (63:0) : error : undefined identifier: 'vxc_uchar8' (63:0) : error : undefined identifier: 'src' (63:0) : error : undefined identifier: 'src' (63:0) : error : undefined identifier: 'src' (64:0) : error : undefined identifier: 'vxc_uchar8' (64:0) : error : undefined identifier: 'src' (64:0) : error : undefined identifier: 'src' (64:0) : error : undefined identifier: 'src' (65:0) : error : undefined identifier: 'vxc_uchar8' (65:0) : error : undefined identifier: 'src' (65:0) : error : undefined identifier: 'src' (65:0) : error : undefined identifier: 'src' (66:0) : error : undefined identifier: 'vxc_uchar8' (66:0) : error : undefined identifier: 'src' (66:0) : error : undefined identifier: 'src' (66:0) : error : undefined identifier: 'src' (68:0) : error : undefined identifier: 'vxc_short8' (68:0) : error : undefined identifier: 'src' (68:0) : error : undefined identifier: 'src' (68:0) : error : undefined identifier: 'src' (69:0) : error : undefined identifier: 'vxc_short8' (69:0) : error : undefined identifier: 'src' (69:0) : error : undefined identifier: 'src' (69:0) : error : undefined identifier: 'src' (70:0) : error : undefined identifier: 'vxc_short8' (70:0) : error : undefined identifier: 'src' (70:0) : error : undefined identifier: 'src' (70:0) : error : undefined identifier: 'src' (71:0) : error : undefined identifier: 'vxc_short8' (71:0) : error : undefined identifier: 'src' (71:0) : error : undefined identifier: 'src' (71:0) : error : undefined identifier: 'src' (72:0) : error : undefined identifier: 'vxc_short8' (72:0) : error : undefined identifier: 'src' (72:0) : error : undefined identifier: 'src' (72:0) : error : undefined identifier: 'src' (73:0) : error : undefined identifier: 'vxc_short8' (73:0) : error : undefined identifier: 'src' (73:0) : error : undefined identifier: 'src' (73:0) : error : undefined identifier: 'src' (74:0) : error : undefined identifier: 'vxc_short8' (74:0) : error : undefined identifier: 'src' (74:0) : error : undefined identifier: 'src' (74:0) : error : undefined identifier: 'src' (75:0) : error : undefined identifier: 'vxc_short8' (75:0) : error : undefined identifier: 'src' (75:0) : error : undefined identifier: 'src' (75:0) : error : undefined identifier: 'src' (115:0) : error : undefined identifier: 'vxc_uchar8' (115:0) : error : undefined identifier: 'src' (115:0) : error : undefined identifier: 'src' (115:0) : error : undefined identifier: 'src' (116:0) : error : undefined identifier: 'vxc_uchar8' (116:0) : error : undefined identifier: 'src' (116:0) : error : undefined identifier: 'src' (116:0) : error : undefined identifier: 'src' (117:0) : error : undefined identifier: 'vxc_uchar8' (117:0) : error : undefined identifier: 'src' (117:0) : error : undefined identifier: 'src' (117:0) : error : undefined identifier: 'src' (118:0) : error : undefined identifier: 'vxc_uchar8' (118:0) : error : undefined identifier: 'src' (118:0) : error : undefined identifier: 'src' (118:0) : error : undefined identifier: 'src' (119:0) : error : undefined identifier: 'vxc_uchar8' (119:0) : error : undefined identifier: 'src' (119:0) : error : undefined identifier: 'src' (119:0) : error : undefined identifier: 'src' (120:0) : error : undefined identifier: 'vxc_uchar8' (120:0) : error : undefined identifier: 'src' (120:0) : error : undefined identifier: 'src' (120:0) : error : undefined identifier: 'src' (121:0) : error : undefined identifier: 'vxc_uchar8' (121:0) : error : undefined identifier: 'src' (121:0) : error : undefined identifier: 'src' (121:0) : error : undefined identifier: 'src' (122:0) : error : undefined identifier: 'vxc_uchar8' (122:0) : error : undefined identifier: 'src' (122:0) : error : undefined identifier: 'src' (122:0) : error : undefined identifier: 'src' (124:0) : error : undefined identifier: 'vxc_short8' (124:0) : error : undefined identifier: 'src' (124:0) : error : undefined identifier: 'src' ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [_gpu_register:476]Build program fail. E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.tile_remain3_U8toU8_2D fail with -1. [ OK ] Tile.shape_3_2_1_int8_multiples_2_2_1 (80 ms) [----------] 2 tests from Tile (125 ms total) [----------] 14 tests from TransposeConv2d [ RUN ] TransposeConv2d.shape_4_4_1_1_int8_QuantizedPerChannelOneTest Segmentation fault khadas@Khadas:~/TIM-VX-1.1.32/install/bin$ ```

The output message after executing TVM test_operations.py at X86 Host side:

```console python3 test_operations.py Testing QNN pattern 1. press any key and continue... make MOD Done! conv2d NHWC layout is not optimized for x86 with autotvm. #[version = "0.0.5"] def @main(%data: Tensor[(1, 56, 56, 32), int8], %weight: Tensor[(1, 1, 32, 64), int8], %add: Tensor[(64), int32]) { %0 = qnn.conv2d(%data, %weight, 0, 77, 0.023528f, 0.045283f, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %1 = nn.bias_add(%0, %add, axis=3); qnn.requantize(%1, 0.00106542f, 0, 0.0235285f, 0, out_dtype="int8") } get_ref_result get_vsi_result get_vsi_model:before relay.build vsi_npu.py --> qnn.requantize This is important----> name_node.value() == tvmgen_default_vsi_npu_0 GraphMakerImpl::Create TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d VsiNpuModule::GetFunction: get_symbol VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: get_const_vars VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: get_const_vars VsiNpuModule::GetFunction: return early VsiNpuModule::SaveToBinary SaveToBinary: nbg size = 15552 SaveToBinary: input size = 1 SaveToBinary: output size = 1 VsiNpuModule : SerializeTensorSpec VsiNpuModule : SerializeTensorSpec2 VsiNpuModule : SerializeTensorSpec VsiNpuModule : SerializeTensorSpec2 VsiNpuModule::SaveToBinary2 /tmp/tmpamfs6yew/model.so model.so {'data': array([[[[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]], [[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]], [[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]], ..., [[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]], [[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]], [[1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], ..., [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1], [1, 1, 1, ..., 1, 1, 1]]]], dtype=int8)} ref_out [[[[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] ... [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]]]] vsi_out [[[[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] ... [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]]]] Expected output: [[[[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] ... [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]] [[-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] ... [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64] [-128 -128 -128 ... -67 -65 -64]]]] Actual output: Not equal to tolerance rtol=0.001, atol=0.001 Mismatched elements: 200704 / 200704 (100%) Max absolute difference: 127 Max relative difference: inf x: array([[[[-128, -128, -128, ..., -67, -65, -64], [-128, -128, -128, ..., -67, -65, -64], [-128, -128, -128, ..., -67, -65, -64],... y: array([[[[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0],... FAIL ```

The output message after executing TVM test_operations.py at VIM3 Pro side:

```console python3 -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090 INFO:root:If you are running ROCM/Metal, fork will cause compiler internal error. Try to launch with arg ```--no-fork``` INFO:RPCServer:bind to 0.0.0.0:9090 INFO:RPCServer:connection from ('XXX.XXX.XXX.XXX', 53076) VsiNpuModule::LoadFromBinary LoadFromBinary: nbg size = 15552 LoadFromBinary: input size = 1 LoadFromBinary: output size = 1 VsiNpuModule : DeSerializeTensorSpec VsiNpuModule : DeSerializeTensorSpec2 VsiNpuModule : DeSerializeTensorSpec VsiNpuModule : DeSerializeTensorSpec2 INFO:RPCServer:load_module /tmp/tmpa5luf_rw/model.so VsiNpuModule::GetFunction: _lookup_linked_param VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: _lookup_linked_param VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: _lookup_linked_param VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: _lookup_linked_param VsiNpuModule::GetFunction: return early VsiNpuModule::GetFunction: tvmgen_default_vsi_npu_0 [ 1] PLS isn't existed Process Graph: 2 ms or 2363 us VsiNpuModule::GetFunction: size: 2 INFO:RPCServer:Finish serving ('XXX.XXX.XXX.XXX', 53076) ```

Test Functions Passed in test_operations.py

```python test_qnn_add() test_float_add() test_float_relu() test_uint8_relu() test_float_leaky_relu() test_uint8_leaky_relu() test_float_softmax() test_float_reshape() test_float_tranpose() test_float_relu6() test_uint8_relu6() test_dequantize() test_quantize() test_uint8_avg_pool() test_uint8_softmax() test_uint8_reshape() test_uint8_concatenation() test_uint8_max_pool() test_float_mean()? test_uint8_argmax() test_float_sigmoid() test_uint8_sigmoid() test_uint8_fullconnected() test_uint8_argmin() test_uint8_squeeze() test_uint8_depthtospace() test_qnn_sub() test_qnn_multiply() test_qnn_maximum() test_qnn_minimum() test_qnn_logical_and() test_qnn_logical_or() test_qnn_pad() test_uint8_mean() test_requantize() test_uint8_transpose_conv2d_pattern() test_uint8_transpose_conv2d_pattern2() test_uint8_tanh() ```

Test Functions Failed in test_operations.py

```python test_float32_conv2d_permute() #vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100% test_float32_depthwise_conv2d_permute() #vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100% test_sample_model() #vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100% test_float_avg_pool() #vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100% test_float32_pattern() #ref_out!=vsi_out Mismatched elements: 100% test_uint8_depthwiseconv2d_pattern() #ref_out!=vsi_out Mismatched elements: 515 / 864 (59.6%) test_uint8_conv2d_pattern() #vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100% test_uint8_resizeBilinear() #AttributeError: module 'tvm.relay.op.image' has no attribute 'resize' #Because relay.op.image.resize was removed in the version test_float_batch_norm() #std: :bad_alloc test_uint8_resizeNear() #AttributeError: module 'tvm.relay.op.image' has no attribute 'resize' #Because relay.op.image.resize was removed in the version ```

If you need more debug messages, please let me know. Thanks.

It seems that the system is not able to locate runtime compiler header file cl_viv_vx_ext.h. You can either copy this file over to current run directory, or set VIVANTE_SDK_DIR to point to the location which contains this header file (include/CL/cl_viv_vx_ext.h).

(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h. (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h. (27:0) : error : undefined identifier: 'COPY' (55:0) : error : undefined identifier: 'COPY' (257:0) : error : syntax error at 'VXC_512Bits'

@thezha Thanks for your reply. I found the cl_viv_vx_ext.h location is /usr/include/CL/. I done this.

export VIVANTE_SDK_DIR=/usr

And ran again TIM_VX unit_test, get some errors without "Cannot find the header file cl_viv_vx_ext.h." error. The following is full output:

```console Running main() from /home/khadas/TIM-VX-1.1.32/_deps/googletest-src/googletest/src/gtest_main.cc [==========] Running 104 tests from 33 test suites. [----------] Global test environment set-up. [----------] 1 test from Context [ RUN ] Context.create [ OK ] Context.create (43 ms) [----------] 1 test from Context (43 ms total) [----------] 2 tests from graph [ RUN ] graph.gen_binary_graph_with_empty_graph E [_graph_optimization_convert_int8_to_uint8:792]CHECK STATUS(-1:A generic error code, used when no other describes the error.) E [vsi_nn_OptimizeGraph:827]CHECK STATUS(-1:A generic error code, used when no other describes the error.) [ OK ] graph.gen_binary_graph_with_empty_graph (6 ms) [ RUN ] graph.gen_binary_graph_with_simple_add [ OK ] graph.gen_binary_graph_with_simple_add (20 ms) [----------] 2 tests from graph (26 ms total) [----------] 2 tests from Linear [ RUN ] Linear.shape_5_1_fp32 [ OK ] Linear.shape_5_1_fp32 (7 ms) [ RUN ] Linear.shape_5_1_fp32_omit_b [ OK ] Linear.shape_5_1_fp32_omit_b (5 ms) [----------] 2 tests from Linear (13 ms total) [----------] 3 tests from Conv1d [ RUN ] Conv1d.shape_3_6_1_float_ksize_1_stride_1_weights_3_no_bias_whcn [ OK ] Conv1d.shape_3_6_1_float_ksize_1_stride_1_weights_3_no_bias_whcn (14 ms) [ RUN ] Conv1d.shape_6_2_1_uint8_ksize_6_stride_1_weights_2_whcn [ OK ] Conv1d.shape_6_2_1_uint8_ksize_6_stride_1_weights_2_whcn (7 ms) [ RUN ] Conv1d.shape_6_2_1_uint8_ksize_3_stride_1_pad_1_weights_2_no_bias_whcn [ OK ] Conv1d.shape_6_2_1_uint8_ksize_3_stride_1_pad_1_weights_2_no_bias_whcn (6 ms) [----------] 3 tests from Conv1d (27 ms total) [----------] 19 tests from Conv2d [ RUN ] Conv2d.shape_4_2_1_1_float32_PaddingTest [ OK ] Conv2d.shape_4_2_1_1_float32_PaddingTest (17 ms) [ RUN ] Conv2d.shape_4_2_2_2_float32_PointwiseTest [ OK ] Conv2d.shape_4_2_2_2_float32_PointwiseTest (16 ms) [ RUN ] Conv2d.shape_4_2_1_2_float32_SimpleTest [ OK ] Conv2d.shape_4_2_1_2_float32_SimpleTest (12 ms) [ RUN ] Conv2d.shape_4_2_2_2_float32_SimpleChannelsTest [ OK ] Conv2d.shape_4_2_2_2_float32_SimpleChannelsTest (11 ms) [ RUN ] Conv2d.shape_6_3_1_1_float32_SimpleAnisotropicStridesTest [ OK ] Conv2d.shape_6_3_1_1_float32_SimpleAnisotropicStridesTest (11 ms) [ RUN ] Conv2d.shape_4_3_1_1_float32_HandCalculatedTest [ OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedTest (12 ms) [ RUN ] Conv2d.shape_4_3_1_1_float32_HandCalculatedConstFilterTest [ OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedConstFilterTest (12 ms) [ RUN ] Conv2d.shape_4_3_1_1_float32_HandCalculatedBiasTest [ OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedBiasTest (12 ms) [ RUN ] Conv2d.shape_4_3_1_1_float32_HandCalculatedValidTest [ OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedValidTest (12 ms) [ RUN ] Conv2d.shape_4_2_2_2_float32_DisabledPointwiseMultifilterTest [ OK ] Conv2d.shape_4_2_2_2_float32_DisabledPointwiseMultifilterTest (9 ms) [ RUN ] Conv2d.shape_9_9_1_1_float32_SimpleDilationTest [ OK ] Conv2d.shape_9_9_1_1_float32_SimpleDilationTest (12 ms) [ RUN ] Conv2d.shape_4_2_1_2_float32_StrideTest [ OK ] Conv2d.shape_4_2_1_2_float32_StrideTest (12 ms) [ RUN ] Conv2d.shape_4_2_1_2_float32_InputAndFilterSameWidthHeightTest [ OK ] Conv2d.shape_4_2_1_2_float32_InputAndFilterSameWidthHeightTest (8 ms) [ RUN ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest1 [ OK ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest1 (6 ms) [ RUN ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest2 [ OK ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest2 (6 ms) [ RUN ] Conv2d.shape_6_3_1_1_uint8_AnisotropicStridesQuantizedTest [ OK ] Conv2d.shape_6_3_1_1_uint8_AnisotropicStridesQuantizedTest (6 ms) [ RUN ] Conv2d.shape_9_9_1_1_uint8_DilationQuantizedTest [ OK ] Conv2d.shape_9_9_1_1_uint8_DilationQuantizedTest (6 ms) [ RUN ] Conv2d.shape_3_2_2_1_int8_QuantizedPerTensorTest [ OK ] Conv2d.shape_3_2_2_1_int8_QuantizedPerTensorTest (19 ms) [ RUN ] Conv2d.shape_3_2_2_1_int8_QuantizedPerChannelTest [ OK ] Conv2d.shape_3_2_2_1_int8_QuantizedPerChannelTest (12 ms) [----------] 19 tests from Conv2d (213 ms total) [----------] 2 tests from DeConv1d [ RUN ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv1d_test.cc:69: Failure Expected equality of these values: golden Which is: { 27, 81, 30, 9, 3, 21, 15, 27, 0, 0 } output_data Which is: { 48, 96, 57, 9, 3, 0, 0, 0, 0, 0 } Result mismatch [ FAILED ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 (9 ms) [ RUN ] DeConv1d.layout_whcn_shape_3_1_1 [ OK ] DeConv1d.layout_whcn_shape_3_1_1 (92 ms) [----------] 2 tests from DeConv1d (101 ms total) [----------] 2 tests from DeConv2d [ RUN ] DeConv2d.shape_3_3_2_1_float_depthwise /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv2d_test.cc:85: Failure Expected equality of these values: golden Which is: { 27, 72, 18, 24, 3, 81, 45, 90, 15, 21, 30, 26, 43, 22, 11, 9, 5, 25, 10, 14, 3, 2, 9, 4, 6, 21, 27, 52, 63, 7, 15, 6, ... } output_data Which is: { 48, 99, 70, 87, 10, 96, 51, 134, 29, 42, 57, 26, 168, 94, 33, 9, 5, 65, 26, 38, 3, 2, 81, 4, 22, 0, 0, 0, 0, 0, 0, 0, ... } Result mismatch [ FAILED ] DeConv2d.shape_3_3_2_1_float_depthwise (9 ms) [ RUN ] DeConv2d.shape_3_3_1_1_float [ OK ] DeConv2d.shape_3_3_1_1_float (9 ms) [----------] 2 tests from DeConv2d (18 ms total) [----------] 16 tests from DepthwiseConv [ RUN ] DepthwiseConv.shape_2_3_2_1_float32_SimpleTest [ OK ] DepthwiseConv.shape_2_3_2_1_float32_SimpleTest (19 ms) [ RUN ] DepthwiseConv.shape_2_3_2_1_float32_StrideValidTest [ OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideValidTest (12 ms) [ RUN ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameTest [ OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameTest (11 ms) [ RUN ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameDilationTest [ OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameDilationTest (11 ms) [ RUN ] DepthwiseConv.shape_2_3_2_1_float32_PaddingTest [ OK ] DepthwiseConv.shape_2_3_2_1_float32_PaddingTest (12 ms) [ RUN ] DepthwiseConv.shape_9_9_1_1_float32_DilationValidTest [ OK ] DepthwiseConv.shape_9_9_1_1_float32_DilationValidTest (11 ms) [ RUN ] DepthwiseConv.shape_3_3_1_1_float32_DilationSameTest [ OK ] DepthwiseConv.shape_3_3_1_1_float32_DilationSameTest (12 ms) [ RUN ] DepthwiseConv.shape_3_3_4_2_float32_BatchValidTest [ OK ] DepthwiseConv.shape_3_3_4_2_float32_BatchValidTest (11 ms) [ RUN ] DepthwiseConv.shape_2_2_1_4_float32_BatchSameTest [ OK ] DepthwiseConv.shape_2_2_1_4_float32_BatchSameTest (12 ms) [ RUN ] DepthwiseConv.shape_2_3_2_1_uint8_QuantizedTest [ OK ] DepthwiseConv.shape_2_3_2_1_uint8_QuantizedTest (6 ms) [ RUN ] DepthwiseConv.shape_9_9_1_1_uint8_QuantizedDilationdValidTest [ OK ] DepthwiseConv.shape_9_9_1_1_uint8_QuantizedDilationdValidTest (6 ms) [ RUN ] DepthwiseConv.shape_3_3_1_1_uint8_QuantizedDilationdSameTest [ OK ] DepthwiseConv.shape_3_3_1_1_uint8_QuantizedDilationdSameTest (6 ms) [ RUN ] DepthwiseConv.shape_3_2_2_1_int8_PerTensorTest [ OK ] DepthwiseConv.shape_3_2_2_1_int8_PerTensorTest (13 ms) [ RUN ] DepthwiseConv.shape_3_2_2_1_int8_PerAxisTest [ OK ] DepthwiseConv.shape_3_2_2_1_int8_PerAxisTest (12 ms) [ RUN ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelValidTest [ OK ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelValidTest (12 ms) [ RUN ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelSameTest [ OK ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelSameTest (13 ms) [----------] 16 tests from DepthwiseConv (181 ms total) [----------] 3 tests from FloorDiv [ RUN ] FloorDiv.shape_1_fp32 [ OK ] FloorDiv.shape_1_fp32 (69 ms) [ RUN ] FloorDiv.shape_5_1_broadcast_float32 [ OK ] FloorDiv.shape_5_1_broadcast_float32 (38 ms) [ RUN ] FloorDiv.shape_5_1_broadcast_uint8 [ OK ] FloorDiv.shape_5_1_broadcast_uint8 (256 ms) [----------] 3 tests from FloorDiv (364 ms total) [----------] 3 tests from GroupedConv2d [ RUN ] GroupedConv2d.shape_3_3_6_1_float_group_1_no_bias_whcn [ OK ] GroupedConv2d.shape_3_3_6_1_float_group_1_no_bias_whcn (7 ms) [ RUN ] GroupedConv2d.shape_3_3_6_1_float_group_2_whcn [ OK ] GroupedConv2d.shape_3_3_6_1_float_group_2_whcn (7 ms) [ RUN ] GroupedConv2d.shape_3_3_6_1_uint8_group_6_whcn [ OK ] GroupedConv2d.shape_3_3_6_1_uint8_group_6_whcn (15 ms) [----------] 3 tests from GroupedConv2d (29 ms total) [----------] 2 tests from InstanceNorm [ RUN ] InstanceNorm.shape_3_6_1_float [ OK ] InstanceNorm.shape_3_6_1_float (125 ms) [ RUN ] InstanceNorm.shape_3_3_6_1_float [ OK ] InstanceNorm.shape_3_3_6_1_float (80 ms) [----------] 2 tests from InstanceNorm (205 ms total) [----------] 2 tests from LayerNorm [ RUN ] LayerNorm.axis_0_shape_3_6_1_float [ OK ] LayerNorm.axis_0_shape_3_6_1_float (60 ms) [ RUN ] LayerNorm.axis_0_shape_2_3_6_1_float [ OK ] LayerNorm.axis_0_shape_2_3_6_1_float (58 ms) [----------] 2 tests from LayerNorm (118 ms total) [----------] 3 tests from LogSoftmax [ RUN ] LogSoftmax.shape_6_1_float_axis_0 [ OK ] LogSoftmax.shape_6_1_float_axis_0 (123 ms) [ RUN ] LogSoftmax.shape_3_6_1_float_axis_1 [ OK ] LogSoftmax.shape_3_6_1_float_axis_1 (48 ms) [ RUN ] LogSoftmax.shape_3_6_1_uint8_axis_1 [ OK ] LogSoftmax.shape_3_6_1_uint8_axis_1 (958 ms) [----------] 3 tests from LogSoftmax (1129 ms total) [----------] 3 tests from Matmul [ RUN ] Matmul.shape_2_6_shape_6_2_float [ OK ] Matmul.shape_2_6_shape_6_2_float (38 ms) [ RUN ] Matmul.shape_2_3_2_shape_2_3_2_float_transpose_b [ OK ] Matmul.shape_2_3_2_shape_2_3_2_float_transpose_b (42 ms) [ RUN ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a [ OK ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a (169 ms) [----------] 3 tests from Matmul (249 ms total) [----------] 2 tests from MaxpoolWithArgmax [ RUN ] MaxpoolWithArgmax.shape_3_3_1_fp32_kernel_2_stride_2 [ OK ] MaxpoolWithArgmax.shape_3_3_1_fp32_kernel_2_stride_2 (49 ms) [ RUN ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 [ OK ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 (124 ms) [----------] 2 tests from MaxpoolWithArgmax (173 ms total) [----------] 2 tests from MaxUnpool2d [ RUN ] MaxUnpool2d.shape_2_2_1_fp32_kernel_2_stride_2 [ OK ] MaxUnpool2d.shape_2_2_1_fp32_kernel_2_stride_2 (52 ms) [ RUN ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 [ OK ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 (150 ms) [----------] 2 tests from MaxUnpool2d (202 ms total) [----------] 2 tests from Moments [ RUN ] Moments.shape_6_3_1_float_axes_0_1 [ OK ] Moments.shape_6_3_1_float_axes_0_1 (62 ms) [ RUN ] Moments.shape_3_6_1_float_axes_1_keepdims [ OK ] Moments.shape_3_6_1_float_axes_1_keepdims (37 ms) [----------] 2 tests from Moments (99 ms total) [----------] 1 test from Equal [ RUN ] Equal.shape_1_uint8 [ OK ] Equal.shape_1_uint8 (523 ms) [----------] 1 test from Equal (523 ms total) [----------] 1 test from NotEqual [ RUN ] NotEqual.shape_5_fp32 [ OK ] NotEqual.shape_5_fp32 (64 ms) [----------] 1 test from NotEqual (64 ms total) [----------] 1 test from Less [ RUN ] Less.shape_5_1_fp32 [ OK ] Less.shape_5_1_fp32 (62 ms) [----------] 1 test from Less (63 ms total) [----------] 1 test from GreaterOrEqual [ RUN ] GreaterOrEqual.shape_5_2_1_fp32 [ OK ] GreaterOrEqual.shape_5_2_1_fp32 (62 ms) [----------] 1 test from GreaterOrEqual (63 ms total) [----------] 1 test from Greater [ RUN ] Greater.shape_5_2_1_1_fp32 [ OK ] Greater.shape_5_2_1_1_fp32 (62 ms) [----------] 1 test from Greater (63 ms total) [----------] 1 test from LessOrEqual [ RUN ] LessOrEqual.shape_1_5_2_1_1_fp32 [ OK ] LessOrEqual.shape_1_5_2_1_1_fp32 (62 ms) [----------] 1 test from LessOrEqual (62 ms total) [----------] 2 tests from Reorg [ RUN ] Reorg.shape_4_4_4_1_u8 [ OK ] Reorg.shape_4_4_4_1_u8 (6 ms) [ RUN ] Reorg.shape_4_4_4_1_fp32 [ OK ] Reorg.shape_4_4_4_1_fp32 (6 ms) [----------] 2 tests from Reorg (12 ms total) [----------] 3 tests from Resize1d [ RUN ] Resize1d.shape_4_2_1_float_nearest_whcn [ OK ] Resize1d.shape_4_2_1_float_nearest_whcn (29 ms) [ RUN ] Resize1d.shape_4_2_1_uint8_nearest_whcn [ OK ] Resize1d.shape_4_2_1_uint8_nearest_whcn (100 ms) [ RUN ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn [ OK ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn (35 ms) [----------] 3 tests from Resize1d (164 ms total) [----------] 2 tests from ScatterND [ RUN ] ScatterND.shape_4_4_4 [ OK ] ScatterND.shape_4_4_4 (41 ms) [ RUN ] ScatterND.shape_9 [ OK ] ScatterND.shape_9 (74 ms) [----------] 2 tests from ScatterND (115 ms total) [----------] 1 test from Floor [ RUN ] Floor.shape_5_1_fp32 [ OK ] Floor.shape_5_1_fp32 (5 ms) [----------] 1 test from Floor (5 ms total) [----------] 1 test from Cast [ RUN ] Cast.shape_5_1_fp32_to_int32 [ OK ] Cast.shape_5_1_fp32_to_int32 (35 ms) [----------] 1 test from Cast (35 ms total) [----------] 1 test from SpatialTransformer [ RUN ] SpatialTransformer.shape_1_3_3_1_u8 [ OK ] SpatialTransformer.shape_1_3_3_1_u8 (138 ms) [----------] 1 test from SpatialTransformer (139 ms total) [----------] 2 tests from Tile [ RUN ] Tile.shape_3_2_float_multiples_2_1 [ OK ] Tile.shape_3_2_float_multiples_2_1 (45 ms) [ RUN ] Tile.shape_3_2_1_int8_multiples_2_2_1 [ OK ] Tile.shape_3_2_1_int8_multiples_2_2_1 (315 ms) [----------] 2 tests from Tile (360 ms total) [----------] 14 tests from TransposeConv2d [ RUN ] TransposeConv2d.shape_4_4_1_1_float32_SimpleTest [ OK ] TransposeConv2d.shape_4_4_1_1_float32_SimpleTest (8 ms) [ RUN ] TransposeConv2d.shape_4_4_2_1_float32_SameTest [ OK ] TransposeConv2d.shape_4_4_2_1_float32_SameTest (9 ms) [ RUN ] TransposeConv2d.shape_4_4_2_1_float32_ValidTest [ OK ] TransposeConv2d.shape_4_4_2_1_float32_ValidTest (8 ms) [ RUN ] TransposeConv2d.shape_2_2_1_1_float32_StrideTest [ OK ] TransposeConv2d.shape_2_2_1_1_float32_StrideTest (9 ms) [ RUN ] TransposeConv2d.shape_2_2_1_1_float32_ChannelTest [ OK ] TransposeConv2d.shape_2_2_1_1_float32_ChannelTest (9 ms) [ RUN ] TransposeConv2d.shape_2_1_1_1_float32_AccuracyTest [ OK ] TransposeConv2d.shape_2_1_1_1_float32_AccuracyTest (9 ms) [ RUN ] TransposeConv2d.shape_2_2_1_1_float32_BiasChannelTest [ OK ] TransposeConv2d.shape_2_2_1_1_float32_BiasChannelTest (12 ms) [ RUN ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedTest [ OK ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedTest (6 ms) [ RUN ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedTwoFiltersTest [ OK ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedTwoFiltersTest (5 ms) [ RUN ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedValidTest [ OK ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedValidTest (5 ms) [ RUN ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedBiasTest [ OK ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedBiasTest (5 ms) [ RUN ] TransposeConv2d.shape_4_4_1_1_int8_QuantizedPerChannelOneTest Segmentation fault ```

@thezha The Galcore version is 6.4.4.3.310723AAA. Is any relation with TIM-VX version?

@leokuo725 I recommend that you get the latest driver SDK/galcore from here and push to the device.

https://github.com/VeriSilicon/TIM-VX/releases/tag/v1.1.34.fix

https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.34.fix/aarch64_A311D_6.4.8.tgz

@sunshinemyson I build the TIM-VX v1.1.34.fix. Some errors happened.

```console [ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/activations_test.cc.o /usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup': matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode' collect2: error: ld returned 1 exit status make[2]: *** [samples/benchmark_test/CMakeFiles/benchmark_test.dir/build.make:100: samples/benchmark_test/benchmark_test] Error 1 make[1]: *** [CMakeFiles/Makefile2:527: samples/benchmark_test/CMakeFiles/benchmark_test.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... [ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/addn_test.cc.o [ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/avg_pool_test.cc.o [ 95%] Linking CXX executable lenet [ 95%] Linking CXX executable multi_thread_test /usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup': matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode' collect2: error: ld returned 1 exit status make[2]: *** [samples/lenet/CMakeFiles/lenet.dir/build.make:100: samples/lenet/lenet] Error 1 make[1]: *** [CMakeFiles/Makefile2:555: samples/lenet/CMakeFiles/lenet.dir/all] Error 2 [ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/conv1d_test.cc.o /usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup': matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode' collect2: error: ld returned 1 exit status make[2]: *** [samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/build.make:100: samples/multi_thread_test/multi_thread_test] Error 1 make[1]: *** [CMakeFiles/Makefile2:583: samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all] Error 2 [ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/conv2d_test.cc.o [ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/deconv1d_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/deconv2d_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/depthwiseConv_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/elementwise_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/groupedconv2d_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/instancenormalization_test.cc.o [ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/layernormalization_test.cc.o [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/logsoftmax_test.cc.o [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/matmul_test.cc.o [ 97%] Linking CXX shared library libtim-vx.so [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/maxpoolwithargmax_test.cc.o [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/maxunpool2d_test.cc.o [ 97%] Built target tim-vx [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/moments_test.cc.o [ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/relational_operations_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/reorg_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/resize1d_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/scatternd_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/shuffle_channel_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/simple_operations_test.cc.o [ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/spatial_transformer_test.cc.o [ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/tile_test.cc.o [ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/transposeConv_test.cc.o [ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/unidirectional_sequence_lstm_test.cc.o [ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/unstack_test.cc.o [ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/transform/layout_inference_test.cc.o [100%] Linking CXX executable unit_test /usr/bin/ld: vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup': matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode' collect2: error: ld returned 1 exit status make[2]: *** [src/tim/CMakeFiles/unit_test.dir/build.make:549: src/tim/unit_test] Error 1 make[1]: *** [CMakeFiles/Makefile2:418: src/tim/CMakeFiles/unit_test.dir/all] Error 2 make: *** [Makefile:130: all] Error 2 khadas@Khadas:~/TIM-VX-1.1.34.fix/build$ ```

I try cross-compile the TIM-VX now. It can be compiled. But have Segmentation fault (core dumped) error while I run the test_operations.py.

Is TIM-VX unit test running OK now?

Is TIM-VX unit test running OK now?

@thezha Cross-Compile the TIM-VX 1.1.34.fix at X86 and no bin folder in install folder. But I compile for x86 without -DCONFIG=A311D, I can get bin folder in install folder. And I run unit test on x86 host. It has some error.

```console Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build_x86/_deps/googletest-src/googletest/src/gtest_main.cc [==========] Running 122 tests from 39 test suites. [----------] Global test environment set-up. [----------] 1 test from Context [ RUN ] Context.create [ OK ] Context.create (121 ms) [----------] 1 test from Context (121 ms total) [----------] 2 tests from graph [ RUN ] graph.gen_binary_graph_with_empty_graph E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.) E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.) [ OK ] graph.gen_binary_graph_with_empty_graph (140 ms) [ RUN ] graph.gen_binary_graph_with_simple_add [ OK ] graph.gen_binary_graph_with_simple_add (294 ms) [----------] 2 tests from graph (434 ms total) [----------] 2 tests from Linear [ RUN ] Linear.shape_5_1_fp32 [ OK ] Linear.shape_5_1_fp32 (180 ms) [ RUN ] Linear.shape_5_1_fp32_omit_b [ OK ] Linear.shape_5_1_fp32_omit_b (179 ms) [----------] 2 tests from Linear (359 ms total) [----------] 2 tests from Gelu [ RUN ] Gelu.shape_5_1_fp32_approximate W [_setup:243]Call vxTensorTableLookupLayer fail. [ OK ] Gelu.shape_5_1_fp32_approximate (160 ms) [ RUN ] Gelu.shape_5_1_uint8_Quantized [ OK ] Gelu.shape_5_1_uint8_Quantized (128 ms) [----------] 2 tests from Gelu (288 ms total) [----------] 3 tests from AddN [ RUN ] AddN.shape_2_2_int32 [ OK ] AddN.shape_2_2_int32 (230 ms) [ RUN ] AddN.shape_3_1_float32 [ OK ] AddN.shape_3_1_float32 (230 ms) [ RUN ] AddN.shape_2_2_uint8_Quantized /media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure The difference between expected[i] and actual[i] is 4, which exceeds abs_error, where expected[i] evaluates to 131, actual[i] evaluates to 127, and abs_error evaluates to 1. at index:0 /media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure The difference between expected[i] and actual[i] is 11, which exceeds abs_error, where expected[i] evaluates to 138, actual[i] evaluates to 127, and abs_error evaluates to 1. at index:1 /media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure The difference between expected[i] and actual[i] is 6, which exceeds abs_error, where expected[i] evaluates to 133, actual[i] evaluates to 127, and abs_error evaluates to 1. at index:2 /media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure The difference between expected[i] and actual[i] is 17, which exceeds abs_error, where expected[i] evaluates to 144, actual[i] evaluates to 127, and abs_error evaluates to 1. at index:3 [ FAILED ] AddN.shape_2_2_uint8_Quantized (998 ms) [----------] 3 tests from AddN (1458 ms total) [----------] 4 tests from AVG [ RUN ] AVG.shape_3_3_1_2_fp32_kernel_2_stride_1 [ OK ] AVG.shape_3_3_1_2_fp32_kernel_2_stride_1 (1055 ms) [ RUN ] AVG.shape_3_3_1_1_fp32_kernel_2_stride_1 [ OK ] AVG.shape_3_3_1_1_fp32_kernel_2_stride_1 (1068 ms) [ RUN ] AVG.shape_3_3_1_1_uint8_kernel_2_stride_1 [ OK ] AVG.shape_3_3_1_1_uint8_kernel_2_stride_1 (127 ms) [ RUN ] AVG.shape_60_52_3_5_fp32_kernel_35_stride_5 [ OK ] AVG.shape_60_52_3_5_fp32_kernel_35_stride_5 (5096 ms) [----------] 4 tests from AVG (7346 ms total) [----------] 2 tests from AVG_ANDROID [ RUN ] AVG_ANDROID.shape_60_52_3_5_fp32_kernel_35_stride_5 [ OK ] AVG_ANDROID.shape_60_52_3_5_fp32_kernel_35_stride_5 (5113 ms) [ RUN ] AVG_ANDROID.shape_60_52_3_5_uint8_kernel_35_stride_5 Segmentation fault (core dumped) ```

If I execute the old version(1.1.32) unit test with new version SDK(6.4.8) and Galcore version 6.4.6.2.

please run ldd on the unit_test for both x86 and khadas and supply the output here.

kainan@ubuntu:~/projects/opensource/TIM-VX/build_$ ldd src/tim/unit_test 
    linux-vdso.so.1 (0x00007fffca76e000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa847db3000)
    libOpenVX.so.1 => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libOpenVX.so.1 (0x00007fa8474b3000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa8472d1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa847182000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa847167000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa846f73000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa84856a000)
    libVSC.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libVSC.so (0x00007fa845d5f000)
    libGAL.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libGAL.so (0x00007fa845928000)
    libArchModelSw.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libArchModelSw.so (0x00007fa8456c7000)
    libNNArchPerf.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libNNArchPerf.so (0x00007fa84545b000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa845455000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa845448000)
    libEmulator.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libEmulator.so (0x00007fa844fe9000)
    libvdtproxy.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libvdtproxy.so (0x00007fa844de6000

@thezha X86(TIM-VX 1.1.34.fix):

~/TIM-VX-1.1.34.fix/build_x86/src/tim$ ldd unit_test 
    linux-vdso.so.1 (0x00007ffcf6fbb000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe03d4e0000)
    libOpenVX.so.1 => /usr/lib/x86_64-linux-gnu/libOpenVX.so.1 (0x00007fe03cbe0000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe03c7d3000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe03c435000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe03c21d000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe03be2c000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe03e04b000)
    libVSC.so => /usr/lib/x86_64-linux-gnu/libVSC.so (0x00007fe03ac18000)
    libGAL.so => /usr/lib/x86_64-linux-gnu/libGAL.so (0x00007fe03a7e1000)
    libArchModelSw.so => /usr/lib/x86_64-linux-gnu/libArchModelSw.so (0x00007fe03a580000)
    libNNArchPerf.so => /usr/lib/x86_64-linux-gnu/libNNArchPerf.so (0x00007fe03a314000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe03a110000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe039f08000)
    libEmulator.so => /usr/lib/x86_64-linux-gnu/libEmulator.so (0x00007fe039aa9000)
    libvdtproxy.so => /usr/lib/x86_64-linux-gnu/libvdtproxy.so (0x00007fe0398a6000)

Khadas(TIM-VX 1.1.32):

~/TIM-VX-1.1.32/src/tim$ ldd unit_test 
    linux-vdso.so.1 (0x0000007f7a974000)
    libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f7a204000)
    libOpenVX.so => /lib/libOpenVX.so (0x0000007f79fe3000)
    libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f79dfe000)
    libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f79d51000)
    libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f79d2d000)
    libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f79bba000)
    /lib/ld-linux-aarch64.so.1 (0x0000007f7a944000)
    libVSC.so => /lib/libVSC.so (0x0000007f78c01000)
    libGAL.so => /lib/libGAL.so (0x0000007f789ff000)
    libArchModelSw.so => /lib/libArchModelSw.so (0x0000007f789b0000)
    libNNArchPerf.so => /lib/libNNArchPerf.so (0x0000007f78943000)
    libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7892f000)
    librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f78917000)

I cannot find unit_test in TIM-VX v1.1.34.fix(Khadas).

unit_test is not enabled by default, it must be built with 'cmake -DTIM_VX_ENABLE_TEST=ON ..'

From your LDD result, it seems that you copied the SDK libraries to system library folders, this is not advised because they are not part of the system library.

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.

export LD_LIBRARY_PATH=`pwd`/../../../prebuilt-sdk/x86_64_linux/lib:$LD_LIBRARY_PATH

unit_test is not enabled by default, it must be built with 'cmake -DTIM_VX_ENABLE_TEST=ON ..'

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

From your LDD result, it seems that you copied the SDK libraries to system library folders, this is not advised because they are not part of the system library.

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.
export LD_LIBRARY_PATH=`pwd`/../../../prebuilt-sdk/x86_64_linux/lib:$LD_LIBRARY_PATH

At Khadas side, May I copy from TIM-VX/build/install/lib/* to /usr/lib?

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

I tried it. If I set both -DCONFIG=A311D and -DTIM_VX_ENABLE_TEST=ON, there is no unit_test in the src/tim/ Files in src/tim

~/TIM-VX-1.1.34.fix/build2/src/tim$ ls
CMakeFiles           libtim-vx.so        Makefile  vx
cmake_install.cmake  libtim-vx-static.a  utils

At Khadas side, May I copy from TIM-VX/build/install/lib/* to /usr/lib?

It is recommended to copy the entire aarch64_A311D_6.4.8/ folder onto board somewhere and set LD_LIBRARY_PATH to point to it. Something like this,

export LD_LIBREARY_PATH=path_to_aarch64_A311D_6.4.8:$LD_LIBRARY_PATH

Also, inside aarch64_A311D_6.4.8/ folder there is a corresponding galcore.ko, and you should use that.

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

I tried it. If I set both -DCONFIG=A311D and -DTIM_VX_ENABLE_TEST=ON, there is no unit_test in the src/tim/ Files in src/tim
~/TIM-VX-1.1.34.fix/build2/src/tim$ ls
CMakeFiles           libtim-vx.so        Makefile  vx
cmake_install.cmake  libtim-vx-static.a  utils

@sunshinemyson Any idea?

This is an issue from CMake. Because we hard-reset compiler configuration in A311D.cmake, the cmake will reconfig the project and the TIM_VX_ENABLE_TEST will be reseted.

To fix this issue, you need comment out following config in the A311D.cmake, and create a toolchain config locally. set(TOOLCHAIN_DIR ${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu) set(CMAKE_C_COMPILER ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc) set(CMAKE_CXX_COMPILER ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-g++) set(CMAKE_AR ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-ar) set(CMAKE_AS ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-as) set(CMAKE_LD ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-ld)

Here is my config for your reference toolchain-vim3.cmake.txt

@sunshinemyson

I followed above steps. Commented out in cmake/A311D.cmake and put toolchain-vim3.cmake.txt into cmake folder. But have other problem. The file format of libCLC.so is aarch64. How should I do to enable the toolchain-vim3.cmake?

[ 99%] Linking CXX shared library libtim-vx.so
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
src/tim/CMakeFiles/tim-vx.dir/build.make:1080: recipe for target 'src/tim/libtim-vx.so' failed
make[2]: *** [src/tim/libtim-vx.so] Error 1
CMakeFiles/Makefile2:221: recipe for target 'src/tim/CMakeFiles/tim-vx.dir/all' failed
make[1]: *** [src/tim/CMakeFiles/tim-vx.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 99%] Linking CXX executable benchmark_test
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/benchmark_test/CMakeFiles/benchmark_test.dir/build.make:112: recipe for target 'samples/benchmark_test/benchmark_test' failed
make[2]: *** [samples/benchmark_test/benchmark_test] Error 1
CMakeFiles/Makefile2:300: recipe for target 'samples/benchmark_test/CMakeFiles/benchmark_test.dir/all' failed
make[1]: *** [samples/benchmark_test/CMakeFiles/benchmark_test.dir/all] Error 2
[ 99%] Linking CXX executable lenet
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/lenet/CMakeFiles/lenet.dir/build.make:112: recipe for target 'samples/lenet/lenet' failed
make[2]: *** [samples/lenet/lenet] Error 1
CMakeFiles/Makefile2:327: recipe for target 'samples/lenet/CMakeFiles/lenet.dir/all' failed
make[1]: *** [samples/lenet/CMakeFiles/lenet.dir/all] Error 2
[100%] Linking CXX executable multi_thread_test
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/build.make:112: recipe for target 'samples/multi_thread_test/multi_thread_test' failed
make[2]: *** [samples/multi_thread_test/multi_thread_test] Error 1
CMakeFiles/Makefile2:354: recipe for target 'samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all' failed
make[1]: *** [samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

it looks like you link target so with host build. did you set toolchain by -DCMAKE_TOOLCHAIN_FILE ?

@sunshinemyson

I used, and got error.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:2 (project):
  The CMAKE_C_COMPILER:

    /opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.

CMake Error at CMakeLists.txt:2 (project):
  The CMAKE_CXX_COMPILER:

    /opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.

-- Configuring incomplete, errors occurred!

I want to set CROSS_COMPILE_ENV to ${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu.

#Original
set(CROSS_COMPILE_ENV "/opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu")
#Modified
set(CROSS_COMPILE_ENV "${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu")

Then, got another error.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=/media/data/home/leokuo/TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
-- Check for working C compiler: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc - broken
CMake Error at /media/data/shared/cmake-3.20.0-rc2-linux-x86_64/share/cmake-3.20/Modules/CMakeTestCCompiler.cmake:66 (message):
  The C compiler

    "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make -f Makefile cmTC_7fde6/fast && /usr/bin/make  -f CMakeFiles/cmTC_7fde6.dir/build.make CMakeFiles/cmTC_7fde6.dir/build
    make[1]: Entering directory '/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp'
    Building C object CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc --sysroot=/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc   -mtune=cortex-a53 -o CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o -c /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/testCCompiler.c
    Linking C executable cmTC_7fde6
    /media/data/shared/cmake-3.20.0-rc2-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/cmTC_7fde6.dir/link.txt --verbose=1
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc --sysroot=/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o -o cmTC_7fde6 
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crt1.o: No such file or directory
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crti.o: No such file or directory
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find -lc
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crtn.o: No such file or directory
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_7fde6.dir/build.make:98: recipe for target 'cmTC_7fde6' failed
    make[1]: *** [cmTC_7fde6] Error 1
    make[1]: Leaving directory '/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp'
    Makefile:127: recipe for target 'cmTC_7fde6/fast' failed
    make: *** [cmTC_7fde6/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)

-- Configuring incomplete, errors occurred!
See also "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeOutput.log".
See also "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeError.log".

Should I need to change any environment about PATH or CMAKE?

You should download toolchain from https://cnbj1.fds.api.xiaomi.com/mace/third-party/gcc-linaro/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu.tar.xz, and change the toolchain configuration with your local install directory.

This is my local host directory, you need change it. /opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++

@sunshinemyson The following are my steps. But still no unit test program in src/tim. NO ERROR in these commands.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=/media/data/home/leokuo/TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
make -j32
make install
ls -al src/tim/
total 31868
drwxr-xr-x 5 leokuo leokuo     4096 Oct 14 13:48 .
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 ..
drwxr-xr-x 4 leokuo leokuo     4096 Oct 14 11:51 CMakeFiles
-rw-r--r-- 1 leokuo leokuo     5960 Oct 14 11:51 cmake_install.cmake
-rwxr-xr-x 1 leokuo leokuo 11956368 Oct 14 13:48 libtim-vx.so
-rw-r--r-- 1 leokuo leokuo 20526940 Oct 14 13:48 libtim-vx-static.a
-rw-r--r-- 1 leokuo leokuo   112260 Oct 14 11:51 Makefile
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 utils
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 vx

@sunshinemyson I added "set(TIM_VX_ENABLE_TEST ON)" to CMakeList.txt:(Between if("${CONFIG}" STREQUAL "A311D") and include(cmake/A311D.cmake)) So, I got unit_test at VIM3. The following is the output message with errors.

khadas@Khadas:~/TIM-VX-1.1.34.fix/build/src/tim$ ./unit_test 
Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 122 tests from 39 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
[       OK ] Context.create (23 ms)
[----------] 1 test from Context (23 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
[       OK ] graph.gen_binary_graph_with_empty_graph (3 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:61: Failure
Value of: graph->CompileToBinary(nbg_buf.data(), &bin_size)
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:72: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
E [compute_node:379]Create node[0] NBG fail
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:86: Failure
Value of: nbg_graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:87: Failure
Value of: nbg_graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:91: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
[  FAILED  ] graph.gen_binary_graph_with_simple_add (8 ms)
[----------] 2 tests from graph (11 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:55: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:56: Failure
Value of: graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:59: Failure
Expected equality of these values:
  golden
    Which is: { -0.5, 1.9, 2, 2.55, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
[  FAILED  ] Linear.shape_5_1_fp32 (7 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:86: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:87: Failure
Value of: graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:90: Failure
Expected equality of these values:
  golden
    Which is: { -5, -0.2, 0, 1.1, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
[  FAILED  ] Linear.shape_5_1_fp32_omit_b (7 ms)
[----------] 2 tests from Linear (14 ms total)

[----------] 2 tests from Gelu
[ RUN      ] Gelu.shape_5_1_fp32_approximate
W [_setup:243]Call vxTensorTableLookupLayer fail.

Segmentation fault

@leo,

Please try to set VIV_VX_DEBUG_LEVEL=1 and share the log again. It's interesting because I can get a full pass on my side. And your graph cannot be compiled successfully all the time.

@sunshinemyson Did you get full pass on VIM3 Pro?Or on X86 simulator? The following is the output with VIV_VX_DEBUG_LEVEL=1:

khadas@Khadas:~/TIM-VX-1.1.34.fix/build/src/tim$ ./unit_test 
Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 122 tests from 39 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
#productname=VIPNano-QI, pid=0x88
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x79fa81b0
Created VX Thread: 0x7ad621b0
Exit VX Thread: 0x79fa81b0
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x79fa81b0
Exit VX Thread: 0x79fa81b0
Exit VX Thread: 0x7ad621b0
[       OK ] Context.create (30 ms)
[----------] 1 test from Context (30 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x7ad621b0
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
Exit VX Thread: 0x7ad621b0
[       OK ] graph.gen_binary_graph_with_empty_graph (5 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   1    1    1 1,        4, 0x0x3cb77b60(0x0x3cb77b60, 0x(nil)) ->    1    1    1 1,        4, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x380, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x0, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x380, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x48, patchSize: 0x364, lcdSize 0x480
NBG: entranceSize: 0x1f0, nbIOSize: 0x15c, layeSize: 0x4c, sectionsSize: 0x450, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x480, headerSize : 0x7e8
Calculate NBG size : 4776 bytes
generate NBG into memory start.
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=0, output count=0
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fd9740 0x44fd67c0 
 output table address: 0x44fd3fc0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=2/1, refine input count=2, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=2, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CF03630 00000000
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:61: Failure
Value of: graph->CompileToBinary(nbg_buf.data(), &bin_size)
  Actual: false
Expected: true
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:72: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
prev_ptrs = 0x3cebea80
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
binary graph format version, 0x1000c
readBinDynamic[1861]: lcd size if 0, error
fail in read Binary Dynamic
fail to load binary from pointer to create graph
NBG error, please provide genereating NBG logs first
fail to import kernel from VPMN
                               , error code: -1
E [compute_node:379]Create node[0] NBG fail
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:86: Failure
Value of: nbg_graph->Compile()
  Actual: false
Expected: true
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:87: Failure
Value of: nbg_graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cebea80
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:91: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
prev_ptrs = 0x3cebea80
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] graph.gen_binary_graph_with_simple_add (9 ms)
[----------] 2 tests from graph (14 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
Save binary graph for VIPLite. 
network binary graph file has been opened
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=1, output count=1
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fc7cc0 
 output table address: 0x44fc4ec0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=1/1, refine input count=1, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=1, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CECD780 00000000
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:55: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_CollectInputAndOutput[13820]: input node param count is bigger than 1018224656 > 5
vxoBinaryGraph_SaveBinaryEntrance[14903]: failed to collect input and output of network
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:56: Failure
Value of: graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cbb2ec0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:59: Failure
Expected equality of these values:
  golden
    Which is: { -0.5, 1.9, 2, 2.55, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] Linear.shape_5_1_fp32 (7 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
Save binary graph for VIPLite. 
network binary graph file has been opened
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=1, output count=1
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fc1ec0 
 output table address: 0x44fbecc0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=1/1, refine input count=1, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=1, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CBAAFB0 00000000
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:86: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_CollectInputAndOutput[13820]: input node param count is bigger than 1018224656 > 5
vxoBinaryGraph_SaveBinaryEntrance[14903]: failed to collect input and output of network
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:87: Failure
Value of: graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cebfcc0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:90: Failure
Expected equality of these values:
  golden
    Which is: { -5, -0.2, 0, 1.1, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] Linear.shape_5_1_fp32_omit_b (9 ms)
[----------] 2 tests from Linear (16 ms total)

[----------] 2 tests from Gelu
[ RUN      ] Gelu.shape_5_1_fp32_approximate
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
CopyArrayRange from ptr 0x3cf4f7f0 to 0x7fe7fe6a50 from 0 to 1024
CopyArrayRange from ptr 0x3cf2f460 to 0x7fe7fe5a50 from 0 to 1024
hardware doesn't support
W [_setup:243]Call vxTensorTableLookupLayer fail.
Kernel "com.vivantecorp.extension.cl.hard_gelu_F32toF32_2D" does not exist
Segmentation fault

Did you set other env variable such as VIV_VX_ENABLE_SAVE_NETWORK_BINARY?

@sunshinemyson No. I just added the following env variable. Should I add VIV_VX_ENABLE_SAVE_NETWORK_BINARY to .bashrc?

export PYTHONPATH=/home/khadas/VeriSilicon-tvm/python:$PYTHONPATH
export LD_LIBRARY_PATH=/home/khadas/TIM-VX-1.1.34.fix/build/install:/home/khadas/VeriSilicon-tvm/build:$LD_LIBRARY_PATH
export VIVANTE_SDK_DIR=/home/khadas/TIM-VX-1.1.34.fix/build/aarch64_A311D_6.4.8

@sunshinemyson Thanks. Now, I can pass the unit_test. According to TVM VSI Readme, in [start runtime on the target as a service] section(https://github.com/VeriSilicon/tvm/blob/vsi_npu/README.VSI.md#start-runtime-on-the-target-as-a-service). What path of <path/to/versilicon/driver/sdk> should I set?

sdk is the root dir of our driver. should have following structure: sdk/include /drivers(or lib)/libOpenVX.so Which could download from the release.

@sunshinemyson I inferenced model using TVM. And get empty output. The following is dmesg output.

[  135.267564] npu_version: 2
[  135.268371] galcore irq number is 36.
[  135.268382] Galcore version 6.4.6.2
[  627.735912] [galcore]: GPU[0] hang, automatic recovery.
[  627.748042] ====>>>>npu hardware reset end!
[  627.748196] [galcore]: recovery done
[  689.175159] [galcore]: GPU[0] hang, automatic recovery.
[  689.187353] ====>>>>npu hardware reset end!
[  689.187525] [galcore]: recovery done
[  750.615034] [galcore]: GPU[0] hang, automatic recovery.
[  750.627147] ====>>>>npu hardware reset end!
[  750.627311] [galcore]: recovery done
[  812.054627] [galcore]: GPU[0] hang, automatic recovery.
[  812.067103] ====>>>>npu hardware reset end!
[  812.067282] [galcore]: recovery done
[  873.493411] [galcore]: GPU[0] hang, automatic recovery.
[  873.514460] ====>>>>npu hardware reset end!
[  873.517882] [galcore]: recovery done
[  934.932529] [galcore]: GPU[0] hang, automatic recovery.
[  934.944641] ====>>>>npu hardware reset end!
[  934.944811] [galcore]: recovery done

Then, I cannot rmmod galcore until rebooting.

@leokuo725 Sorry that we can not give you suggestion about the issue in time. Please let me know if it still an issue?

I also ran into this bug.

I compiled Tengine-Lite (commit https://github.com/OAID/Tengine/commit/1aea916c9cb566393c243c4cc6b4e64fb563dd93) with TIM-VX (commit 68b5acb) using gcc-linaro-6.3.1-2017.05-i686-mingw32_aarch64-linux-gnu.
the board is Ubuntu 18.04.6 LTS Linux 4.9.241 on A311d
aml-npu is 6.4.3CB-3
the user guide I read is https://github.com/OAID/Tengine/blob/tengine-lite/doc/docs_zh/source_compile/compile_timvx.md
libraries like libOpenVX.so are put in ./3rdparty/tim-vx/lib/aarch64 and downloaded from https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.28/aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
when I ran VIV_VX_DEBUG_LEVEL=1 ./tm_benchmark (with context->device = "TIMVX"), it reported:

Tengine-lite library version: 1.5-dev
Created VX Thread: 0x8fbe0150
#productname=VIPNano-QI, pid=0x88
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
Tengine Fatal: Pre-run subgraph(0) on TIMVX failed.

So what can I do then? Tengine says a "kernel version" of galcore should be >=6.4.4, but I don't know how to read its version.

@gdh1995 You can get the galcore version from dmesg.

@sunshinemyson I cannot run other model on this board, only support mobilenet v2 uint8.

The old galcore is 6.4.3.p0.286725; I've updated it into 6.4.6.2 using rmmod and insmod, but tm_benchmark still reports -1:A generic error code:

E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)

I've tried SDK from https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.37/aarch64_A311D_6.4.9.tgz and https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.37/aarch64_S905D3_6.4.9.tgz . No error messages changed.

I also tried v6.4.8 (https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.34.fix/aarch64_A311D_6.4.8.tgz). The error message is:

E [query_hardware_caps:50]CHECK STATUS(-10:The supplied parameter information does not match the kernel contract.)
E [Init:194]Create tensor fail!

Sorry it's a mistake of mine. I ran tengine's benchmark tool with a yolov3_int8 model, but I didn't realize TIM-VX requires uint8. Now yolov3-tiny_uint8.tmfile works well (downloaded from https://github.com/OAID/Tengine/blob/tengine-lite/README_EN.md#model-zoo).

@sunshinemyson I saw the same error "PLS isn't existed" on my VIM3:

python3 tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py

X86 Host

``` INFO:root:{'name': 'mobilenet_v1_1.0_224_quant.tflite', 'shape': (1, 224, 224, 3), 'input_tensor_name': 'input', 'dtype': 'uint8'} /home/addsalt/data/work/VIM3/tvm_npu/tests/python/contrib/test_vsi_npu/model/mobilenet_v1_1.0_224_quant.tflite INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using softmax.x86 for nn.softmax based on highest priority (10) INFO:compile_engine:Using injective.cpu for divide based on highest priority (10) INFO:compile_engine:Using injective.cpu for round based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for reshape based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for multiply based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using pool.cpu for nn.avg_pool2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm. INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10) INFO:compile_engine:Using injective.cpu for add based on highest priority (10) INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10) INFO:compile_engine:Using injective.cpu for clip based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for cast based on highest priority (10) INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10) INFO:root:#[version = "0.0.5"] def @main(%input: Tensor[(1, 224, 224, 3), uint8], %v_param_1: Tensor[(3, 3, 3, 32), uint8], %v_param_2: Tensor[(32), int32], %v_param_3: Tensor[(3, 3, 32, 1), uint8], %v_param_4: Tensor[(32), int32], %v_param_5: Tensor[(1, 1, 32, 64), uint8], %v_param_6: Tensor[(64), int32], %v_param_7: Tensor[(3, 3, 64, 1), uint8], %v_param_8: Tensor[(64), int32], %v_param_9: Tensor[(1, 1, 64, 128), uint8], %v_param_10: Tensor[(128), int32], %v_param_11: Tensor[(3, 3, 128, 1), uint8], %v_param_12: Tensor[(128), int32], %v_param_13: Tensor[(1, 1, 128, 128), uint8], %v_param_14: Tensor[(128), int32], %v_param_15: Tensor[(3, 3, 128, 1), uint8], %v_param_16: Tensor[(128), int32], %v_param_17: Tensor[(1, 1, 128, 256), uint8], %v_param_18: Tensor[(256), int32], %v_param_19: Tensor[(3, 3, 256, 1), uint8], %v_param_20: Tensor[(256), int32], %v_param_21: Tensor[(1, 1, 256, 256), uint8], %v_param_22: Tensor[(256), int32], %v_param_23: Tensor[(3, 3, 256, 1), uint8], %v_param_24: Tensor[(256), int32], %v_param_25: Tensor[(1, 1, 256, 512), uint8], %v_param_26: Tensor[(512), int32], %v_param_27: Tensor[(3, 3, 512, 1), uint8], %v_param_28: Tensor[(512), int32], %v_param_29: Tensor[(1, 1, 512, 512), uint8], %v_param_30: Tensor[(512), int32], %v_param_31: Tensor[(3, 3, 512, 1), uint8], %v_param_32: Tensor[(512), int32], %v_param_33: Tensor[(1, 1, 512, 512), uint8], %v_param_34: Tensor[(512), int32], %v_param_35: Tensor[(3, 3, 512, 1), uint8], %v_param_36: Tensor[(512), int32], %v_param_37: Tensor[(1, 1, 512, 512), uint8], %v_param_38: Tensor[(512), int32], %v_param_39: Tensor[(3, 3, 512, 1), uint8], %v_param_40: Tensor[(512), int32], %v_param_41: Tensor[(1, 1, 512, 512), uint8], %v_param_42: Tensor[(512), int32], %v_param_43: Tensor[(3, 3, 512, 1), uint8], %v_param_44: Tensor[(512), int32], %v_param_45: Tensor[(1, 1, 512, 512), uint8], %v_param_46: Tensor[(512), int32], %v_param_47: Tensor[(3, 3, 512, 1), uint8], %v_param_48: Tensor[(512), int32], %v_param_49: Tensor[(1, 1, 512, 1024), uint8], %v_param_50: Tensor[(1024), int32], %v_param_51: Tensor[(3, 3, 1024, 1), uint8], %v_param_52: Tensor[(1024), int32], %v_param_53: Tensor[(1, 1, 1024, 1024), uint8], %v_param_54: Tensor[(1024), int32], %v_param_55: Tensor[(1, 1, 1024, 1001), uint8], %v_param_56: Tensor[(1001), int32]) { %0 = qnn.conv2d(%input, %v_param_1, 128, 151, 0.0078125f, 0.0218267f, strides=[2, 2], padding=[0, 0, 1, 1], channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %1 = nn.bias_add(%0, %v_param_2, axis=3); %2 = qnn.requantize(%1, 0.000170521f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %3 = qnn.conv2d(%2, %v_param_3, 0, 110, 0.0235285f, 0.292199f, padding=[1, 1, 1, 1], groups=32, channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %4 = nn.bias_add(%3, %v_param_4, axis=3); %5 = qnn.requantize(%4, 0.006875f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %6 = qnn.conv2d(%5, %v_param_5, 0, 121, 0.0235285f, 0.0304209f, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %7 = nn.bias_add(%6, %v_param_6, axis=3); %8 = qnn.requantize(%7, 0.000715759f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %9 = qnn.conv2d(%8, %v_param_7, 0, 130, 0.0235285f, 0.402773f, strides=[2, 2], padding=[0, 0, 1, 1], groups=64, channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %10 = nn.bias_add(%9, %v_param_8, axis=3); %11 = qnn.requantize(%10, 0.00947663f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %12 = qnn.conv2d(%11, %v_param_9, 0, 104, 0.0235285f, 0.0151482f, padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %13 = nn.bias_add(%12, %v_param_10, axis=3); %14 = qnn.requantize(%13, 0.000356414f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %15 = qnn.conv2d(%14, %v_param_11, 0, 160, 0.0235285f, 0.0605373f, padding=[1, 1, 1, 1], groups=128, channels=128, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %16 = nn.bias_add(%15, %v_param_12, axis=3); %17 = qnn.requantize(%16, 0.00142435f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %18 = qnn.conv2d(%17, %v_param_13, 0, 94, 0.0235285f, 0.0137555f, padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %19 = nn.bias_add(%18, %v_param_14, axis=3); %20 = qnn.requantize(%19, 0.000323645f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %21 = qnn.conv2d(%20, %v_param_15, 0, 123, 0.0235285f, 0.0167581f, strides=[2, 2], padding=[0, 0, 1, 1], groups=128, channels=128, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %22 = nn.bias_add(%21, %v_param_16, axis=3); %23 = qnn.requantize(%22, 0.000394292f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %24 = qnn.conv2d(%23, %v_param_17, 0, 151, 0.0235285f, 0.00760185f, padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %25 = nn.bias_add(%24, %v_param_18, axis=3); %26 = qnn.requantize(%25, 0.00017886f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %27 = qnn.conv2d(%26, %v_param_19, 0, 129, 0.0235285f, 0.0410553f, padding=[1, 1, 1, 1], groups=256, channels=256, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %28 = nn.bias_add(%27, %v_param_20, axis=3); %29 = qnn.requantize(%28, 0.000965968f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %30 = qnn.conv2d(%29, %v_param_21, 0, 122, 0.0235285f, 0.00643161f, padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %31 = nn.bias_add(%30, %v_param_22, axis=3); %32 = qnn.requantize(%31, 0.000151326f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %33 = qnn.conv2d(%32, %v_param_23, 0, 122, 0.0235285f, 0.0134608f, strides=[2, 2], padding=[0, 0, 1, 1], groups=256, channels=256, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %34 = nn.bias_add(%33, %v_param_24, axis=3); %35 = qnn.requantize(%34, 0.000316712f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %36 = qnn.conv2d(%35, %v_param_25, 0, 109, 0.0235285f, 0.00917122f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %37 = nn.bias_add(%36, %v_param_26, axis=3); %38 = qnn.requantize(%37, 0.000215785f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %39 = qnn.conv2d(%38, %v_param_27, 0, 132, 0.0235285f, 0.0369348f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %40 = nn.bias_add(%39, %v_param_28, axis=3); %41 = qnn.requantize(%40, 0.000869019f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %42 = qnn.conv2d(%41, %v_param_29, 0, 140, 0.0235285f, 0.00530005f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %43 = nn.bias_add(%42, %v_param_30, axis=3); %44 = qnn.requantize(%43, 0.000124702f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %45 = qnn.conv2d(%44, %v_param_31, 0, 94, 0.0235285f, 0.0426099f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %46 = nn.bias_add(%45, %v_param_32, axis=3); %47 = qnn.requantize(%46, 0.00100255f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %48 = qnn.conv2d(%47, %v_param_33, 0, 127, 0.0235285f, 0.00496329f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %49 = nn.bias_add(%48, %v_param_34, axis=3); %50 = qnn.requantize(%49, 0.000116779f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %51 = qnn.conv2d(%50, %v_param_35, 0, 127, 0.0235285f, 0.0283589f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %52 = nn.bias_add(%51, %v_param_36, axis=3); %53 = qnn.requantize(%52, 0.000667241f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %54 = qnn.conv2d(%53, %v_param_37, 0, 89, 0.0235285f, 0.0077709f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %55 = nn.bias_add(%54, %v_param_38, axis=3); %56 = qnn.requantize(%55, 0.000182837f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %57 = qnn.conv2d(%56, %v_param_39, 0, 134, 0.0235285f, 0.0243294f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %58 = nn.bias_add(%57, %v_param_40, axis=3); %59 = qnn.requantize(%58, 0.000572435f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %60 = qnn.conv2d(%59, %v_param_41, 0, 99, 0.0235285f, 0.00965865f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %61 = nn.bias_add(%60, %v_param_42, axis=3); %62 = qnn.requantize(%61, 0.000227253f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %63 = qnn.conv2d(%62, %v_param_43, 0, 106, 0.0235285f, 0.0193668f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %64 = nn.bias_add(%63, %v_param_44, axis=3); %65 = qnn.requantize(%64, 0.000455672f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %66 = qnn.conv2d(%65, %v_param_45, 0, 153, 0.0235285f, 0.00544699f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %67 = nn.bias_add(%66, %v_param_46, axis=3); %68 = qnn.requantize(%67, 0.000128159f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %69 = qnn.conv2d(%68, %v_param_47, 0, 126, 0.0235285f, 0.00783559f, strides=[2, 2], padding=[0, 0, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %70 = nn.bias_add(%69, %v_param_48, axis=3); %71 = qnn.requantize(%70, 0.00018436f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %72 = qnn.conv2d(%71, %v_param_49, 0, 130, 0.0235285f, 0.00817923f, padding=[0, 0, 0, 0], channels=1024, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %73 = nn.bias_add(%72, %v_param_50, axis=3); %74 = qnn.requantize(%73, 0.000192445f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %75 = qnn.conv2d(%74, %v_param_51, 0, 211, 0.0235285f, 0.126169f, padding=[1, 1, 1, 1], groups=1024, channels=1024, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32"); %76 = nn.bias_add(%75, %v_param_52, axis=3); %77 = qnn.requantize(%76, 0.00296857f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %78 = qnn.conv2d(%77, %v_param_53, 0, 95, 0.0235285f, 0.0180482f, padding=[0, 0, 0, 0], channels=1024, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %79 = nn.bias_add(%78, %v_param_54, axis=3); %80 = qnn.requantize(%79, 0.000424646f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8"); %81 = cast(%80, dtype="int32"); %82 = nn.avg_pool2d(%81, pool_size=[7, 7], strides=[2, 2], padding=[0, 0, 0, 0], layout="NHWC"); %83 = cast(%82, dtype="uint8"); %84 = qnn.conv2d(%83, %v_param_55, 0, 74, 0.0235285f, 0.0049866f, padding=[0, 0, 0, 0], channels=1001, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32"); %85 = nn.bias_add(%84, %v_param_56, axis=3); %86 = qnn.requantize(%85, 0.000117327f, 0, 0.166099f, 66, axis=3, out_dtype="uint8"); %87 = reshape(%86, newshape=[1, 1001]); %88 = qnn.dequantize(%87, 0.166099f, 66); %89 = nn.softmax(%88, axis=1); qnn.quantize(%89, 0.00390625f, 0, out_dtype="uint8") } [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:414: name_node.value() == tvmgen_default_vsi_npu_main_0 [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:287: Create [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_softmax [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:237: TensorMakerImpl::InferCall: reshape [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_avgpool2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_softmax [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:396: GraphMakerImpl::InferCall: reshape [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_avgpool2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d [21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d W [HandleLayoutInfer:268]Op 162: default layout inference pass. [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_symbol [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_const_vars [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_const_vars [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:186: SaveToBinary: nbg size = 5676288: input size = 1: output size = 1: output map size =1 [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:116: SerializeTensorSpec [21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:116: SerializeTensorSpec INFO:root:top5 of ref output 0: INFO:root:283 : 110 INFO:root:282 : 57 INFO:root:286 : 29 INFO:root:464 : 21 INFO:root:264 : 8 INFO:root:top5 of vsi output 0: INFO:root:1000 : 0 INFO:root:335 : 0 INFO:root:333 : 0 INFO:root:331 : 0 INFO:root:334 : 0 Traceback (most recent call last): File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 320, in test_mobilenet_v1_224_quant() File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 272, in test_mobilenet_v1_224_quant process(model) File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 267, in process assert_allclose(vsi_output[i], ref_output[i], rtol=0, atol=tolerance) File "/home/addsalt/data/work/VIM3/tvm_npu/python/tvm/testing/utils.py", line 98, in assert_allclose np.testing.assert_allclose(actual, desired, rtol=rtol, atol=atol, verbose=True) File "/home/addsalt/anaconda3/envs/tvm-build/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose assert_array_compare(compare, actual, desired, err_msg=str(err_msg), File "/home/addsalt/anaconda3/envs/tvm-build/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 840, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=0, atol=5 Mismatched elements: 7 / 1001 (0.699%) Max absolute difference: 255 Max relative difference: 255. x: array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... y: array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... ```

VIM3 Target ( export VIV_VX_DEBUG_LEVEL=1 )

``` INFO:root:If you are running ROCM/Metal, fork will cause compiler internal error. Try to launch with arg ```--no-fork``` INFO:RPCServer:bind to 0.0.0.0:9090 INFO:RPCServer:connection from ('xxx.xxx.x.xxx', 59296) [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:220: LoadFromBinary: nbg size = 5676288: input size = 1: output size = 1: output_map size = 1 [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:148: DeSerializeTensorSpec [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:148: DeSerializeTensorSpec INFO:RPCServer:load_module /tmp/tmphslhdaoh/model.so [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctiontvmgen_default_vsi_npu_main_0 [ 1] PLS isn't existed #productname=VIPNano-QI, pid=0x88 graph gpuCount=1 interConnectRingCount=0 NN ring buffer is disabled binary graph format version, 0x10014 readBinHeader[1489]: binary version: 0x10014, current version: 0x10011 fail to load binary from pointer to create graph NBG error, please provide genereating NBG logs first fail to import kernel from VPMN, error code: -10 E [/home/addsalt/data/work/VIM3/TIM-VX/src/tim/vx/internal/src/vsi_nn_graph.c:compute_node:380]Create node[0] NBG fail vxProcessGraph[22814]: Process Graph fail! [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:105: operator()0 ms or 19 us [13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:107: operator()2 INFO:RPCServer:Finish serving ('xxx.xxx.x.xxx', 59296) ```

TIM-VX Version is 1.1.37 VIM3 galcore version is 6.4.6.2 TVM is the branch upstream/tvm_npu

Could you help me with it？ If you need more debug messages, please let me know. Thank you.

VeriSilicon / TIM-VX

TVM RPC Error "PLS isn't existed" on Khadas VIM3 Pro (Amlogic A311D) #189