Closed gitoabdelgawad closed 1 week ago
@gitoabdelgawad @oguzkaganozt does/did this PR fix this issue?
@gitoabdelgawad @oguzkaganozt does/did this PR fix this issue?
* [feat(docker): fix CUDA compile on devel image and improve run.sh #4849](https://github.com/autowarefoundation/autoware/pull/4849)
yes this PR fix this issue. Thanks I will close the issue
Checklist
Description
Inside ghcr.io/autowarefoundation/autoware-openadk:latest-devel-cuda container Im trying to use tensorrt_yolox package. The package includes some CUDA kernels which fails to build and shows the following warning:
--- stderr: tensorrt_yolox
CMake Warning at CMakeLists.txt:19 (message): CUDA is not found. preprocess acceleration using CUDA will not be available.
It seems that CMake variable CMAKE_CUDA_COMPILER is not set
Then while using tensorrt_yolox for object detection, the system crashes with the following error:
[tensorrt_yolox_node_exe-2] /home/os/elm/autoware/install/tensorrt_yolox/lib/tensorrt_yolox/tensorrt_yolox_node_exe: symbol lookup error: /home/os/elm/autoware/install/tensorrt_yolox/lib/libtensorrt_yolox.so: undefined symbol: _ZN14tensorrt_yolox50resize_bilinear_letterbox_nhwc_to_nchw32_batch_gpuEPfPhiiiiiiifP11CUstream_st [ERROR] [tensorrt_yolox_node_exe-2]: process has died [pid 977, exit code 127, cmd '/home/os/elm/autoware/install/tensorrt_yolox/lib/tensorrt_yolox/tensorrt_yolox_node_exe --ros-args -r __node:=tensorrt_yolox --params-file /tmp/launch_params_d1ll7q3z --params-file /tmp/launch_params_cq_ya7ic -r ~/in/image:=/fr_camera/image_rect -r ~/out/objects:=roi0'].
The missing symbol is actually a CUDA kernel that failed to build previously.
Expected behavior
Actual behavior
tensorrt_yolox builds with a Warning and skips building the CUDA kernels, which leads to a runtime crash later.
Steps to reproduce
Inside ghcr.io/autowarefoundation/autoware-openadk:latest-devel-cuda container
Versions
No response
Possible causes
After some investigation and trying to build the official CUDA Samples to track the issue, it appeared that some cuda libraries were missing /usr/bin/ld: cannot find -lcudadevrt /usr/bin/ld: cannot find -lcudart_static
After applying the following patch and rebuilding the docker image, the cuda kernels were built and object detection model was running well.
Additional context
No response