autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
924 stars 604 forks source link

Colcon build failure of TensorRT using python3.8 and tensorrt-10.1.0.27 #7944

Closed annb3 closed 2 weeks ago

annb3 commented 1 month ago

Checklist

Description

Running colcon build on autoware package of ROS2 humble into the docker environment gives me errors related to the lidar_transfusion package.

Expected behavior

--

Actual behavior

Running colcon build on autoware package of ROS2 humble into the docker environment gives me this errors:

/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp: In member function ‘bool lidar_transfusion::TransfusionTRT::preprocess(const PointCloud2&, const tf2_ros::Buffer&)’:
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:158:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  158 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:160:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setInputShape’; did you mean ‘setInputShapeBinding’?
  160 |   network_trt_ptr_->context->setInputShape(
      |                              ^~~~~~~~~~~~~
      |                              setInputShapeBinding
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:165:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  165 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:167:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setInputShape’; did you mean ‘setInputShapeBinding’?
  167 |   network_trt_ptr_->context->setInputShape(
      |                              ^~~~~~~~~~~~~
      |                              setInputShapeBinding
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:170:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  170 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:172:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setInputShape’; did you mean ‘setInputShapeBinding’?
  172 |   network_trt_ptr_->context->setInputShape(
      |                              ^~~~~~~~~~~~~
      |                              setInputShapeBinding
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:176:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  176 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:178:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  178 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:180:30: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setTensorAddress’
  180 |   network_trt_ptr_->context->setTensorAddress(
      |                              ^~~~~~~~~~~~~~~~
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp: In member function ‘bool lidar_transfusion::TransfusionTRT::inference()’:
/home/user/autoware/src/universe/autoware.universe/perception/lidar_transfusion/lib/transfusion_trt.cpp:187:44: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘enqueueV3’; did you mean ‘enqueueV2’?
  187 |   auto status = network_trt_ptr_->context->enqueueV3(stream_);
      |                                            ^~~~~~~~~
      |                                            enqueueV2
gmake[2]: *** [CMakeFiles/transfusion_lib.dir/build.make:160: CMakeFiles/transfusion_lib.dir/lib/transfusion_trt.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:143: CMakeFiles/transfusion_lib.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2
---
Failed   <<< lidar_transfusion [1min 0s, exited with code 2]

Steps to reproduce

--

Versions

docker environment

TensorRT Version: 10.1.0.27

NVIDIA GPU: NVIDIA RTX A2000 Laptop GPU

NVIDIA Driver Version: 555.42.06

CUDA Version: 12.5

CUDNN Version: cuda_12.2.r12.2

Operating System: Linux - Ubuntu 20.04 LTS

Python Version: 3.8

Possible causes

No response

Additional context

No response

amadeuszsz commented 1 month ago

Hi @annb3 Thank you for your report. Could you please show me your output of dpkg -l | grep nvinfer? Please, make it sure you execute this command in same terminal window where you build your workspace.

annb3 commented 1 month ago

Screenshot from 2024-07-10 17-20-04 It's here :) sorry for the delay @amadeuszsz

amadeuszsz commented 1 month ago

@annb3 You trying to build workspace out of Docker container. To use Docker, please follow Docker installation instructions. If you would like to use Autoware without docker images, please follow source installation instructions. In this case, not only is your TensorRT outdated, but you will also need to upgrade to Ubuntu 22.04.

annb3 commented 1 month ago

No, it is inside :) this is my docker environment. Already used https://autowarefoundation.github.io/autoware-documentation/main/installation/autoware/docker-installation/. Everything works since years until last month. I don't know exactly why.

amadeuszsz commented 1 month ago

No, it is inside :) this is my docker environment. Already used https://autowarefoundation.github.io/autoware-documentation/main/installation/autoware/docker-installation/. Everything works since years until last month. I don't know exactly why.

@annb3 Ok, then:

  1. To make it sure, please execute docker container ps command in same terminal where you build workspace. Show the output please.
  2. If error occurred, execute docker container ps in new terminal window (without docker). Show the output.
  3. Which command you use to run docker image? Recently Autoware dropped rocker support and use new docker images. Please update your image and show output from docker images | grep autoware.
annb3 commented 1 month ago

@amadeuszsz here your requests. :) thanks for the support.

  1. Screenshot from 2024-07-15 10-19-23

  2. Screenshot from 2024-07-15 10-19-32

  3. rocker --nvidia --x11 --network host --devices /dev/vid* --user --volume $HOME/techdemo-autoware_b/autoware --volume $HOME/autoware_map --volume $HOME/autoware_data --volume /dev/shm/:/dev/shm -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda Screenshot from 2024-07-15 10-21-46
amadeuszsz commented 1 month ago

@annb3 You updating autoware.universe while still using old Autoware Docker image. If you wish to use recent changes, you have to update Autoware repository and Docker image as well. Please, update your Autoware repository and proceed to Docker installation tutorial. FYI, the problem is your current image contains TensorRT 8.4.2, lidar_transfusion uses TensorRT 8.6.1, which is already in current Autoware Docker image.

vividf commented 1 month ago

@annb3 May I kindly inquire if @amadeuszsz's suggestion resolved the issue? Thanks.

annb3 commented 1 month ago

@vividf Unfortunately not. The upgrade of the docker container gives me lots of problems and errors. I have also problems trying to download the new one from the beginning. So probably it should solve, but I am unable to try it and I don't know exactly why.. probably the new docker environment is for 22.04LTS? Or am I miss something?

annb3 commented 1 month ago

@vividf It seems that the new installation is concluded but apparently nothing changes: always same problem. Screenshot from 2024-07-23 13-55-09

amadeuszsz commented 1 month ago

@annb3 What is your problem after image update? Please, follow this tutorial and share your error here and command which causes it as well

amadeuszsz commented 1 month ago

@amadeuszsz Exactly the same as before, nothing changes during the building: colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release Screenshot from 2024-07-23 16-04-00

@annb3 What command you used to run Docker container? From now you need to use ./docker/run.sh --devel. Then you can validate TensorRT version as before and run Autoware using prebuilt workspace. If you confirm it, you can proceed to building from sources.

annb3 commented 1 month ago

@amadeuszsz sorry I have published the wrong terminal, that's why I delete the comment. Give me more time I'll give you the correct feedback :)

annb3 commented 1 month ago

@amadeuszsz Exactly the same as before, nothing changes during the building: colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release Screenshot from 2024-07-23 16-04-00

@annb3 What command you used to run Docker container? From now you need to use ./docker/run.sh --devel. Then you can validate TensorRT version as before and run Autoware using prebuilt workspace. If you confirm it, you can proceed to building from sources.

Screenshot from 2024-07-23 17-00-56 I have this problem launching ./docker/run.sh --devel

amadeuszsz commented 1 month ago

Screenshot from 2024-07-23 17-00-56 I have this problem launching ./docker/run.sh --devel

@annb3

Please, check the command provided in linked instruction before for setting up development environment (setup-dev-env.sh script) or pull docker image directly via docker pull ghcr.io/autowarefoundation/autoware:latest-devel-cuda

annb3 commented 3 weeks ago

with setup-dev-env.sh I get: fatal: [localhost]: FAILED! => {"changed": false, "msg": "Only Ubuntu 22.04 is supported for this branch. Please refer to https://autowarefoundation.github.io/autoware-documentation/main/installation/autoware/source-installation/."}

Instead I have already done docker pull ghcr.io/autowarefoundation/autoware:latest-devel-cuda.. I don't know honestly. Can I try something else @amadeuszsz ?

amadeuszsz commented 3 weeks ago

@annb3

with setup-dev-env.sh I get: fatal: [localhost]: FAILED! Did you follow the instruction which I linked for you? Please share your exact command which triggers environment setup.

Instead I have already done docker pull ghcr.io/autowarefoundation/autoware:latest-devel-cuda.. I don't know honestly.

This command was shortcut for you. If you are not sure about pull success, you can always look to sample prompt in Docker documentation.

Can I try something else @amadeuszsz ?

I don't think so. As you already tried before, you can:

If something from documentation is not clear for you, please let us know. We want to make our instructions easy to handle for community as much as we can!

annb3 commented 2 weeks ago

As I was unable to do anything more, I solve the issue upgrading to 22.04 LTS and building from source.

Thanks a lot for your support @amadeuszsz !