Cuda Error in execute: 209 (no kernel image is available for execution on the device)

qraleq commented 2 years ago

Hi,

I'm trying to run TPAT on Jetson AGX with Jetpack 4.4.1

I managed to install everything using the docker image with small modifications to the Dockerfile which now looks like this:

FROM nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf1.15-py3
RUN apt-get update && apt-get install build-essential cmake -y
RUN wget -O "clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz" https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz \
    && tar -xvf clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz && mkdir -p /usr/local/llvm/ \
    && mv clang+llvm-9.0.1-aarch64-linux-gnu/* /usr/local/llvm/
RUN python3 -m pip install --upgrade pip
RUN pip3 install buildtools onnx==1.10.0 
RUN pip3 install pycuda nvidia-pyindex
RUN apt-get install git
RUN pip install onnx-graphsurgeon onnxruntime==1.9.0 tf2onnx xgboost==1.5.2
RUN git clone --recursive https://github.com/Tencent/TPAT.git /workspace/TPAT && cd /workspace/TPAT/3rdparty/blazerml-tvm && mkdir build && cp cmake/config.cmake build && cd build 
RUN sed -i 's/set(USE_LLVM OFF)/set(USE_LLVM \/usr\/local\/llvm\/bin\/llvm-config)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmake 
RUN sed -i 's/set(USE_CUDA OFF)/set(USE_CUDA ON)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmake
RUN cd /workspace/TPAT/3rdparty/blazerml-tvm/build/ && cmake .. && make -j8 
ENV TVM_HOME="/workspace/TPAT/3rdparty/blazerml-tvm/"
ENV PYTHONPATH="$TVM_HOME/python:${PYTHONPATH}"

After running OPENBLAS_CORETYPE=ARMV8 python3 test_tpat.py I get this error:

Onnx_name_mapping_trt_plugin: {'abs_0': 'tpat_abs_0'}
[TensorRT] ERROR: ../rtExt/cuda/cudaPluginV2DynamicExtRunner.cpp (108) - 
Cuda Error in execute: 209 (no kernel image is available for execution on the device)

And it triggers error on assert:

[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[[[1.7640524  0.4001572  0.978738   2.2408931  1.867558  ]
  [0.9772779  0.95008844 0.1513572  0.10321885 0.41059852]
  [0.14404356 1.4542735  0.7610377  0.12167501 0.44386324]
  [0.33367434 1.4940791  0.20515826 0.3130677  0.85409576]]

 [[2.5529897  0.6536186  0.8644362  0.742165   2.2697546 ]
  [1.4543657  0.04575852 0.18718386 1.5327792  1.4693588 ]
  [0.15494743 0.37816253 0.88778573 1.9807965  0.34791216]
  [0.15634897 1.2302907  1.2023798  0.3873268  0.30230275]]

 [[1.048553   1.420018   1.7062702  1.9507754  0.5096522 ]
  [0.4380743  1.2527953  0.7774904  1.6138978  0.21274029]
  [0.89546657 0.3869025  0.51080513 1.1806322  0.02818223]
  [0.42833188 0.06651722 0.3024719  0.6343221  0.36274117]]]
================
[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
trt cross_check output  False
Traceback (most recent call last):
  File "test_tpat.py", line 3860, in <module>
    test_abs()
  File "test_tpat.py", line 360, in test_abs
    op_expect(node, inputs=[x], outputs=[y], op_type=op_type, op_name=op_name)
  File "test_tpat.py", line 346, in op_expect
    verify_with_ort_with_trt(model, inputs, op_name, np_result=np_result)
  File "test_tpat.py", line 300, in verify_with_ort_with_trt
    assert ret, "result check False"
AssertionError: result check False

Can you please provide some guidance on what might be the problem?

Thank you!

wm2012011492 commented 2 years ago

Hi, could you modify the -arch parameter in https://github.com/Tencent/TPAT/blob/6380a44ed1c2c35c97dc30768835197bfb79eeb1/python/trt_plugin/Makefile#L68 to sm_72 and take a try? sm_72 is the corresponding compute capacity for your Jetson AGX device while the default one is sm_75 in our Makefile. By the way, you could look for the compute capability for your GPU through the tables below.

qraleq commented 2 years ago

Hi @wm2012011492 , this worked like a charm! Thank you!

Tencent / TPAT

Cuda Error in execute: 209 (no kernel image is available for execution on the device) #7