apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.82k stars 3.48k forks source link

[Bug] Error opening FastRPC channel #17195

Open chayliu-ecarx opened 4 months ago

chayliu-ecarx commented 4 months ago

By refering to the guide: https://github.com/apache/tvm/tree/main/apps/hexagon_launcher.

Sussfully building all the files needed and push all the files to my android device. However, when runing the launcher_android file some errors occurred with the log as following:

--------- beginning of main
07-25 03:33:21.408  1832  1832 I launcher_android: vendor/qcom/proprietary/adsprpc/src/rpcmem_android.c:159: rpcmem_init_internal: opened ION device fd 3, configured heap IDs: system (0x2000000), contig (0x10), secure (0x400), secure flags (0x80080000)
07-25 03:33:21.408  1832  1832 I launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:3087: fastrpc_apps_user_init done with default domain:3 and &fastrpc_trace:0x7c834410bc
07-25 03:33:21.410  1832  1832 I launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2039: remote_session_control DSP info request for domain 3, thread priority -1, stack size 131072
07-25 03:33:21.410  1832  1832 I launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2636: Successfully opened /vendor/dsp/cdsp/fastrpc_shell_unsigned_33
07-25 03:33:21.410  1832  1832 I launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_config.c:200: Reading configuration file: launcher_android.debugconfig
07-25 03:33:21.411  1832  1832 E launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2839: Error 0x72: apps_dev_init: untrusted app trying to offload to signed remote process (errno 111, Connection refused). Try offloading to unsignedPD using remote_session_control
07-25 03:33:21.413  1832  1832 E launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2870: Error 0x72: apps_dev_init failed for domain 3, errno Success
07-25 03:33:21.413  1832  1832 E launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:2968: Error 0x72: open_dev (-1) failed for domain 3 (errno Success)
07-25 03:33:21.414  1832  1832 E launcher_android: vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:1398: Error 0x72: remote_handle64_open failed for file:///liblauncher_rpc_skel.so?launcher_rpc_skel_handle_invoke&_modver=1.0&_dom=cdsp (errno Success)

the env setting is:

export LD_LIBRARY_PATH=/mypath

ls mypath

adsp              libgcc.so                mobilenetv2-7.json
input_data.dat    liblauncher_rpc_skel.so  mobilenetv2-7.so
launcher_android  libtvm_runtime.so        mobilenetv2-7_input.json
export ADSP_LIBRARY_PATH="/mypath/adsp"

ls /mypath/adsp

libc++.so     libc++abi.so.1  liblauncher_rpc_skel.so
libc++.so.1   libc.so         libqcc.so
libc++abi.so  libgcc.so       libstdc++.so

@kparzysz-quic @tmoreau89 @areusch @csullivan @sdalvi-quic @abhikran-quic @quic-sanirudh

abhikran-quic commented 4 months ago

@chayliu-ecarx: On production devices, any shared library needs to be signed by OEMs to be loaded on DSP. The problem you are observing is due to loading of unsigned library on a signed PD(Protection domain). There's more information about signed vs unsigned PD in Hexagon SDK documentation: SDK_ROOT/docs/software/ipc/rpc.html I have some questions/suggestions below :

  1. Are you trying on a development board ? It would help in sharing the device details on which you are trying.
  2. If you are using a smartphone, try loading the library on unsigned PD. hexagon_launcher by default uses unsigned PD, however, could you check if the application you are running uses unsigned PD ? There is a section titled Request signature-free offload in SDK_ROOT/docs/software/ipc/rpc.html with example code to check it.
  3. Could you move DSP specific libraries/files (mobilenetv2-7.so, mobilenetv2-7.json and mobilenetv2-7_input.json) to /mypath/adsp/ and try again ?
chayliu-ecarx commented 4 months ago

hi @abhikran-quic,

  1. This is not a development board, while it is a OEM device.
  2. This is not an smartphon, while it is a sa8155p device, I have tried as rpc.html run the following example:
    
    # run example with 1000 array size on cDSP
    calculator  0 3 1000

run example with 1000 array size on cDSP Unsigned PD

calculator 0 3 1000 1


both failed with error message:

DSP domain is not provided. Retrieving DSP information using Remote APIs. Overriding user request for unsigned PD. Only signed offload is allowed on domain 3.

Starting calculator test Attempting to run on signed PD on domain 3

Allocate 4 bytes from ION heap Creating sequence of numbers from 0 to 0 Compute sum on domain 3 Retry attempt unsuccessful. Timing out.... ERROR 0x80000406: Failed to compute sum on domain 3 ERROR 0x80000406: Failed to find max on domain 3 ERROR 0x80000406: Calculator test failed

ERROR 0x80000406: Calculator example failed

and the FARF message:

07-26 03:43:49.199 27636 27645 V adsprpc : rtld.c:834:0x330dd:12: Error: dlopen_ex failed for libcalculator_skel.so (flags 2) 07-26 03:43:49.199 27636 27645 V adsprpc : mod_table.c:556:0x330dd:12: Error 0x80000406: open_mod_table_open_dynamic failed for file:///libcalculator_skel.so?calculator_skel_handle_invoke&_modver=1.0&_dom=cdsp 07-26 03:43:49.200 27636 27645 V adsprpc : mod_table.c:583:0x330dd:12: Error: open_mod_table_open_dynamic: failed to load libcalculator_skel.so, resetting default vote as no other modules loaded 07-26 03:43:49.203 27636 27645 V adsprpc : mod_table.c:470:0x330dd:12: INFO: open_mod_table_open_dynamic: Making default vote as new module is getting loaded

...

07-26 03:43:49.210 27636 27645 V adsprpc : rtld.c:834:0x330dd:12: Error: dlopen_ex failed for libcalculator_skel.so (flags 2) 07-26 03:43:49.210 27636 27645 V adsprpc : mod_table.c:556:0x330dd:12: Error 0x80000406: open_mod_table_open_dynamic failed for file:///libcalculator_skel.so?calculator_skel_handle_invoke&_modver=1.0&_dom=cdsp 07-26 03:43:49.210 27636 27645 V adsprpc : mod_table.c:583:0x330dd:12: Error: open_mod_table_open_dynamic: failed to load libcalculator_skel.so, resetting default vote as no other modules loaded


3. Tried but still failed by moving the files.
4. However, I also tried with snpe-1.68.0.3932 inception_v3 example, it run successfully on dsp backend.
abhikran-quic commented 4 months ago

Hi @chayliu-ecarx , I am checking this with an internal team. I will get back asap.

abhikran-quic commented 4 months ago

@chayliu-ecarx : sa8155p doesn't support unsigned PD. Since the application is running on a signed PD, I would recommend pushing testsig.so to target and give a try. To understand how to generate and push testsig.so , please refer Hexagon SDK documentation(Search for signing).

I also tried with snpe-1.68.0.3932 inception_v3 example, it run successfully on dsp backend.

In this case, did you build and push any libraries to hardware or were you using pre compiled libraries ?

chayliu-ecarx commented 4 months ago

I did't build any libraries, but push the libraries in the SNPE SDK folowing snpe-1.68.0.3932/doc/html/tutorial_inceptionv3.html.

abhikran-quic commented 4 months ago

I did't build any libraries, but push the libraries in the SNPE SDK folowing snpe-1.68.0.3932/doc/html/tutorial_inceptionv3.html.

Those libraries are signed and hence they worked on hardware platform.

In case of calculator example/TVM launcher binary, the libraries are compiled from source code and hence they need testsig.so.

chayliu-ecarx commented 3 months ago

The testsig.so did't solve this problem.

Switched to using a sa8155 development board for testing, however there is another issue:

Run as :

export LD_LIBRARY_PATH=/data/local/tmp/tvm
export ADSP_LIBRARY_PATH="/data/local/tmp/tvm/adsp;" 

./launcher_android --in_config mobilenetv2-7_input.json --out_config output.json

after that it seems like the program was blocked,not moving forwards.

the log is:

01-01 01:23:00.400  8891  8891 I launcher_android: vendor/qcom/proprietary/commonsys-intf/adsprpc/src/rpcmem_android.c:158: rpcmem_init_internal: opened ION device fd 3, configured heap IDs: system (0x2000000), contig (0x10), secure (0x400), secure flags (0x80080000)
01-01 01:23:00.400  8891  8891 I launcher_android: vendor/qcom/proprietary/commonsys-intf/adsprpc/src/fastrpc_apps_user.c:2832: fastrpc_apps_user_init done
01-01 01:23:00.402  8891  8891 I launcher_android: vendor/qcom/proprietary/commonsys-intf/adsprpc/src/fastrpc_config.c:136: Reading configuration file: launcher_android.debugconfig
01-01 01:23:00.402  8891  8891 I launcher_android: vendor/qcom/proprietary/commonsys-intf/adsprpc/src/fastrpc_config.c:156: Read fastrpc config file launcher_android.debugconfig found at /data/local/tmp/tvm/adsp

A few minutes later,it was still like this and without any outputs.

abhikran-quic commented 3 months ago

Are you able to run calculator example with testsig.so on development board ? This will help in identifying if FastRPC is working properly.