EdgeAI-Benchmark Setup - Whl Packages Not Supported

IsidoraR commented 2 years ago

Hello,

I'm making a Dockerfile for this repo (based on the conda env). When I build the image from the Dockerfile (attached to this post), I get the following error messages for the wheel packages:

ERROR: tvm-0.8.dev0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform. ERROR: onnxruntime_tidl-1.7.0-cp36-cp36m-linux_x86_64.whl is not a supported wheel on this platform.

However, when I run the same installation commands from the setup.sh script for these wheel packages inside the Docker container, they are successfully installed without any errors.

How should I modify my Dockerfile so that these wheel packages will be installed when I build the Docker image?

Dockerfile.zip benchmark_env_v2.zip

mathmanu commented 2 years ago

I am not expert in Docker. Can you take a look at this Dockerfile and see if it helps: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/Dockerfile

IsidoraR commented 2 years ago

Yes, when I build the Dockerfile in the edgeai-tidl-tools repo, the wheel packages are installed without any errors.

wilderrodrigues commented 2 years ago

I'm not using Docker because my environment is already inside a VirtualBox VM. So, I don't want to add yet another layer.

However, I found many issues with dependencies and reproducibility with the Benchmark repository. To get it working, I had to do this:

Clone the repository: git clone https://github.com/TexasInstruments/edgeai-benchmark.git cd edgeai-benchmark
Create & activate the Conda environment conda create --name ti-edge-ai-benchmark python=3.6 -c conda-forge conda activate ti-edge-ai-benchmark conda update -n base -c defaults conda conda install pip
Proceed with the installation ./setup.sh Before running, please edit the file and pin the version of graphviz to 0.8.1 in the requirements_pc.txt file, otherwise it won't work.

Now, if I want to run the vanila benchmark by running the benchmark on PC, as it is, I get this:

Entering: ./work_dirs/modelartifacts/8bits/cl-3410_tvmdlr_imagenet1k_gluoncv-mxnet_mobilenetv2_1.0-symbol_json.tar.gz.link/srtifacts Not a directory

Which makes total sense, because the ./work_dirs/modelartifacts/8bits/cl-3410_tvmdlr_imagenet1k_gluoncv-mxnet_mobilenetv2_1.0-symbol_json.tar.gz.link is a text file containing the URL to the ZIP file to be downloaded.

I think the code is not really in sync with what is expected of the documentation is not updated.

And then all the models are not loaded properly because they are not downloaded. :(

./work_dirs/modelartifacts/8bits/od-2010_tflitert_coco_mlperf_ssd_mobilenet_v2_300_float_tflite/model/ssd_mobilenet_v2_300_float.tflite': No such file or directory

Any ideas, @mathmanu ?

wilderrodrigues commented 2 years ago

I just found it: the model have to be downloaded from the edgeai-modelzoo using the proveded script, which was fixed here -> https://github.com/TexasInstruments/edgeai-modelzoo/commit/1bc9e1ae1cb4822c41cd82dda19bb5d6efcae7a8

I will now proceed with my experiments.

mathmanu commented 2 years ago

You don't need to download the models manually. You just need to clone the repository https://github.com/TexasInstruments/edgeai-modelzoo in the same folder where you have cloned edgeai-benchmark. edgeai-benchmark understands the .link files and it will automatically download the actuall files pointed by it.

wilderrodrigues commented 2 years ago

Hi @mathmanu ,

I got all setup, built the wheel and got a Docker image to be able to run inference on custom models, etc. From a setup perspective, all looks good. However, when I try the code below it breaks:

    onnx_session_options = rt.SessionOptions()
    providers = ["TIDLExecutionProvider", "CPUExecutionProvider"]
    onnx_session = rt.InferenceSession(str(onnx_model_path), providers=providers,
                                       provider_options=[compile_options, {}], sess_options=onnx_session_options)

I have the LD_LIBRARY_PATH setup and pointing to the tidl_tools directory. All .so files are there and I even executed a ldconfig after setting the environment variable. But I still get this error:

Error -   libtidl_onnxrt_EP.so: cannot open shared object file: No such file or directory 
libtidl_onnxrt_EP loaded (nil) 
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

The libtidl_onnxrt_EP.so is present, path is set, etc. But it doesn't work and I don't know why. :(

Any ideas / help?

mathmanu commented 2 years ago

There was an issue that was fixed yesterday, but your issue seems to be different - but just in case, pull the latest code, run setup.sh and try https://github.com/TexasInstruments/edgeai-benchmark/issues/6

@kumardesappan Do you have any suggestion?

mathmanu commented 2 years ago

May be from inside the python code, you can try to print LD_LIBRARY_PATH and TIDL_TOOLS_PATH print(os.environ['LD_LIBRARY_PATH']) print(os.environ['TIDL_TOOLS_PATH'])

wilderrodrigues commented 2 years ago

Thanks for the super quick response, @mathmanu .

Here is my output:

Python 3.6.15 | packaged by conda-forge | (default, Dec  3 2021, 19:12:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print(os.environ['TIDL_TOOLS_PATH'])
/home/helsing/trecs/edgeai-benchmark/tidl_tools
>>> print(os.environ['LD_LIBRARY_PATH'])
/home/helsing/trecs/edgeai-benchmark/tidl_tools
>>>

I will pull the latest code and run it again. Will keep you updated.

wilderrodrigues commented 2 years ago

Some extra info that might help understanding what's going on:

I built a Docker image based on this one -> FROM balenalib/aarch64-ubuntu:bionic
I built the ONNX RT wheel using a Docker image based on the same as above. a. The generated wheel: onnxruntime_dnnl-1.7.0-cp36-cp36m-linux_aarch64.whl
My Docker container is now running on a MacBook - not an aarch64 host. But I just want to make sure inference runs before I can deploy on the TI device.

Here are the xtra .so I have compiled:

(onnxrt) root@ee6e17a8a5e1:/code/onnxruntime# ls -larth  /code/onnxruntime/build/Linux/MinSizeRel/*.so
-rwxr-xr-x 1 root root 8.3K Apr 26 16:01 /code/onnxruntime/build/Linux/MinSizeRel/libonnxruntime_providers_shared.so
-rwxr-xr-x 1 root root 400K Apr 26 17:03 /code/onnxruntime/build/Linux/MinSizeRel/libonnxruntime_providers_dnnl.so
lrwxrwxrwx 1 root root   23 Apr 26 21:18 /code/onnxruntime/build/Linux/MinSizeRel/libonnxruntime.so -> libonnxruntime.so.1.7.0
-rwxr-xr-x 1 root root  32K Apr 27 01:31 /code/onnxruntime/build/Linux/MinSizeRel/libcustom_op_library.so
-rwxr-xr-x 1 root root 9.9M Apr 27 01:37 /code/onnxruntime/build/Linux/MinSizeRel/onnxruntime_pybind11_state.so

Could that be the culprit, @mathmanu ?

wilderrodrigues commented 2 years ago

I copied the libs under /usr/lib and ran a ldconfig -v. Below part of the output:

/usr/lib:
        libann.so.0 -> libann.so.0.0.0
        libtidl_tfl_delegate.so.1.0 -> libtidl_tfl_delegate.so (changed)
        libtidl_onnxrt_EP.so.1.0 -> libtidl_onnxrt_EP.so (changed)
        libvx_tidl_rt.so.1.0 -> libvx_tidl_rt.so.1.0

This should be fine. But still doesn't work. :(

Error -   libtidl_onnxrt_EP.so: cannot open shared object file: No such file or directory 
libtidl_onnxrt_EP loaded (nil) 
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

wilderrodrigues commented 2 years ago

Sorry for all the comments, just trying to contribute somehow. :) I tried this, from the python REL:

>>> from ctypes.util import find_library
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP"))
>>> id(onnxrt)
365109887776
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x55023ec320>

So weird! I also copied the .so files into the Conda env lib directory.

wilderrodrigues commented 2 years ago

I think this is the issue:

>>> from ctypes.util import find_library
>>> import ctypes
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP.so.1.0"))
>>> id(onnxrt)
365109889400
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x55023ec978>
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP"))
>>> id(onnxrt)
365113550944
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x550276a860>
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP.so"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/trecs/lib/python3.6/ctypes/__init__.py", line 426, in LoadLibrary
    return self._dlltype(name)
  File "/opt/conda/envs/trecs/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/envs/trecs/lib/libtidl_onnxrt_EP.so: cannot open shared object file: No such file or directory
>>>

wilderrodrigues commented 2 years ago

Okay, last comment for today.

I copied the libraries under /usr/local/lib and ran a ldconfig -v. Here is the partial output:

/usr/local/lib:
        libtidl_tfl_delegate.so.1.0 -> libtidl_tfl_delegate.so
        libvx_tidl_rt.so.1.0 -> libvx_tidl_rt.so.1.0
        libtidl_onnxrt_EP.so.1.0 -> libtidl_onnxrt_EP.so

Then on my Python REL I did the following, having a pretty consistent output, no errors:

Python 3.6.15 | packaged by conda-forge | (default, Dec  3 2021, 19:12:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> from ctypes.util import find_library
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP.so.1.0"))
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x55022e59b0>
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP"))
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x550266b828>
>>> onnxrt = ctypes.cdll.LoadLibrary(find_library("libtidl_onnxrt_EP.so"))
>>> onnxrt
<CDLL 'None', handle 55018431b0 at 0x55022e5780>
>>>

However, when I run the code below, I still get the same issue. :(

    onnx_session_options = rt.SessionOptions()
    providers = ["TIDLExecutionProvider", "CPUExecutionProvider"]
    onnx_session = rt.InferenceSession(str(onnx_model_path), providers=providers,
                                       provider_options=[compile_options, {}], sess_options=onnx_session_options)

Any clue?

wilderrodrigues commented 2 years ago

I enabled debug by doing export LD_DEBUG=libs and got some interesting output:

>>> from ctypes import _dlopen
>>> _dlopen("libtidl_onnxrt_EP.so")
       465:     find library=libtidl_onnxrt_EP.so [0]; searching
       465:      search path=/home/helsing/conda/envs/trecs/lib/python3.6/lib-dynload/../..             (RPATH from file /home/helsing/conda/envs/trecs/lib/python3.6/lib-dynload/readline.cpython-36m-aarch64-linux-gnu.so)
       465:       trying file=/home/helsing/conda/envs/trecs/lib/python3.6/lib-dynload/../../libtidl_onnxrt_EP.so
       465:      search path=/home/helsing/conda/envs/trecs/bin/../lib          (RPATH from file python)
       465:       trying file=/home/helsing/conda/envs/trecs/bin/../lib/libtidl_onnxrt_EP.so
       465:      search path=/home/helsing/conda/envs/trecs/lib         (LD_LIBRARY_PATH)
       465:       trying file=/home/helsing/conda/envs/trecs/lib/libtidl_onnxrt_EP.so
       465:      search cache=/etc/ld.so.cache
       465:      search path=/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu:/lib:/usr/lib            (system search path)
       465:       trying file=/lib/aarch64-linux-gnu/libtidl_onnxrt_EP.so
       465:       trying file=/usr/lib/aarch64-linux-gnu/libtidl_onnxrt_EP.so
       465:       trying file=/lib/libtidl_onnxrt_EP.so
       465:       trying file=/usr/lib/libtidl_onnxrt_EP.so
       465:     
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: libtidl_onnxrt_EP.so: cannot open shared object file: No such file or directory
>>>

The .so file is present, as you can see in the snippet below:

(trecs) helsing@ce04838f9587:~/trecs$ ls -larth /home/helsing/conda/envs/trecs/lib/libtidl_onnxrt_EP.so
-rwxr-xr-x 1 helsing root 65K Apr 28 20:31 /home/helsing/conda/envs/trecs/lib/libtidl_onnxrt_EP.so

Will keep digging.

wilderrodrigues commented 2 years ago

Hi @mathmanu,

I think I found the issue: the libtidl_onnxrt_EP.so provided on the tidl_tools released here -> https://github.com/TexasInstruments/edgeai-tidl-tools/releases/tag/08_02_00_01-rc1 are compiled for x86_64 only. I'm testing on the aarch platform.

When can I find those files compile for the right architecture? Do I have to do it myself?

mathmanu commented 2 years ago

The ARM libraries are in our SDK: https://www.ti.com/tool/download/PROCESSOR-SDK-LINUX-SK-TDA4VM https://www.ti.com/tool/SK-TDA4VM

We have tested the inference on EVMs/SoCs.

Does this answer your question?

wilderrodrigues commented 2 years ago

Hi @mathmanu ,

Yeah, I found everything I need yesterday and also successfully executed inference with 2 models on the device we have.

I will proceed to convert our own model and get inference running on it.

Thanks for the support!

wilderrodrigues commented 2 years ago

This issue can be closed, @mathmanu .

TexasInstruments / edgeai-benchmark

EdgeAI-Benchmark Setup - Whl Packages Not Supported #1