Closed gcx2020 closed 9 months ago
(16:52:18) ERROR: /apollo/modules/perception/camera/common/BUILD:104:12: C++ compilation of rule '//modules/perception/camera/common:image_data_operations' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/modules/perception/camera/common/_objs/image_data_operations/image_data_operations.pic.d ... (remaining 227 argument(s) skipped) nvcc fatal : Unsupported gpu architecture 'compute_89' (16:52:18) INFO: Elapsed time: 24.864s, Critical Path: 24.39s (16:52:18) INFO: 1383 processes: 244 internal, 1139 local. (16:52:18) FAILED: Build did NOT complete successfully
I have the same problem.
(16:52:18) ERROR: /apollo/modules/perception/camera/common/BUILD:104:12: C++ compilation of rule '//modules/perception/camera/common:image_data_operations' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/modules/perception/camera/common/_objs/image_data_operations/image_data_operations.pic.d ... (remaining 227 argument(s) skipped) nvcc fatal : Unsupported gpu architecture 'compute_89' (16:52:18) INFO: Elapsed time: 24.864s, Critical Path: 24.39s (16:52:18) INFO: 1383 processes: 244 internal, 1139 local. (16:52:18) FAILED: Build did NOT complete successfully
I have the same problem.
你解决问题了吗?
我也遇到了这个问题,我发现目前apollo容器里面的cuda版本是11.1,支持不了4090,cuda12.0可以支持4090,但是cuda12.0不支持tensorRT7,而apollo的perception模块使用的tensorRT大版本是7,所以无解,我已经放弃在4090编译GPU版本的apollo了
这个问题 只能等官方升级docker镜像来解决问题。
@daohu527 这个问题什么时候能修复呢?
+1+1,我也想问,官方什么时候支持40系列,有计划什么时候出吗
this https://github.com/ApolloAuto/apollo/tree/9.x_alpha already update tensorrt 8
this https://github.com/ApolloAuto/apollo/tree/9.x_alpha already update tensorrt 8
wow,thank you very much,I will try to pull the branch.
@daohu527 it is failure. perception modules.
Will check and feedback soon.
The reason for the problem is that the Cuda version is too old to support 'compute_89'. Can you check the cuda version?
The reason for the problem is that the Cuda version is too old to support 'compute_89'. Can you check the cuda version?
@daohu527 No,my Cuda version is the latest. 12.2
Is the cuda version in Apollo docker also the same?
This is printed in the Apollo Docker, but it's the same for me outside the Apollo Docker. Do I still need to download cuda inside the Apollo docker?
I also encountered the same problem.I have CUDA Driver Version 535.86.05 CUDA Version: 12.2
.and
nvcc fatal : Unsupported gpu architecture 'compute_89'
I try to fix it,I check the nvcc --version
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
and according to this https://docs.nvidia.com/cuda/cuda-runtime-api/driver-vs-runtime-api.html,
I think i need to upgrade nvidia toolkit
, in this page https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local.
I found nvidia toolkit
for 20.04,However, the containerized version of Apollo 9.x runs on Ubuntu 18.04 for x86 architecture.
1.the best solution for future is build your own docker image with higher cuda version support:maybe just change the base image apollo dockerfile to which you think best-fit: https://gitlab.com/nvidia/container-images/cuda/-/tree/master/dist?ref_type=heads and build it on your hardware. of course, there are a lot installation files to be adjust, especially some packages compiled by apollo themself. hard but it's sensible.
Can you try the following methods? It will try to downgrade the computing power of the graphics card to temporarily avoid upgrading CUDA.
add code in function https://github.com/ApolloAuto/apollo/blob/a3c851fc5844e0684b9c5108231fcc2c15cebb8e/third_party/gpus/cuda_configure.bzl#L731
def _compute_cuda_extra_copts(repository_ctx, compute_capabilities):
copts = []
for capability in compute_capabilities:
if capability > "compute_75": # add
capability = "compute_75" # add
if capability.startswith("compute_"):
CUDA applications built using CUDA Toolkit 11.0 through 11.7 are compatible with the NVIDIA Ada GPU architecture as long as they are built to include kernels in Ampere-native cubin
It seems that the ada architecture is compatible with CUDA Toolkit 11.0 through 11.7. I don’t know why there is a compilation error.
Can you try the following methods? It will try to downgrade the computing power of the graphics card to temporarily avoid upgrading CUDA.
add code in function
def _compute_cuda_extra_copts(repository_ctx, compute_capabilities): copts = [] for capability in compute_capabilities: if capability > "compute_75": # add capability = "compute_75" # add if capability.startswith("compute_"):
I encountered the same problem (Unsupported gpu architecture 'compute_89') when compiling apollo, after trying this method, the compilation was successful, but the prediction module could not be started, the error is as follows: ———————— ubuntu22.04+Nvidia4090 ———————— [ps@in-dev-docker:/apollo]$ mainboard -d /apollo/modules/prediction/dag/prediction.dag WARNING: Logging before InitGoogleLogging() is written to STDERR I1029 10:56:38.102371 446394 module_argument.cc:81] []command: mainboard -d /apollo/modules/prediction/dag/prediction.dag I1029 10:56:38.102763 446394 global_data.cc:153] []host ip: 192.168.3.90 I1029 10:56:38.104391 446394 module_argument.cc:57] []binaryname is mainboard, processgroup is mainboard_default, has 1 dag conf I1029 10:56:38.104401 446394 module_argument.cc:60] []dag_conf: /apollo/modules/prediction/dag/prediction.dag terminate called after throwing an instance of 'std::runtime_error' what(): nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
template
template
extern "C" global void func_1(float t0, float t1, float aten_relu_flat) { { if (512 blockIdx.x + threadIdx.x<2 ? 1 : 0) { aten_relu_flat[512 blockIdx.x + threadIdx.x] = (((__ldg(t0 + (512 blockIdx.x + threadIdx.x) % 2)) + (ldg(t1 + (512 * blockIdx.x + threadIdx.x) % 2))<0.f ? 1 : 0) ? 0.f : (ldg(t0 + (512 blockIdx.x + threadIdx.x) % 2)) + (__ldg(t1 + (512 blockIdx.x + threadIdx.x) % 2))); } } }
Aborted (core dumped ——————
then could you try, I'm not sure if it's an architecture compatibility issue
def _compute_cuda_extra_copts(repository_ctx, compute_capabilities):
copts = []
for capability in compute_capabilities:
if capability > "compute_87": # add
capability = "compute_87" # add
if capability.startswith("compute_"):
then could you try, I'm not sure if it's an architecture compatibility issue
def _compute_cuda_extra_copts(repository_ctx, compute_capabilities): copts = [] for capability in compute_capabilities: if capability > "compute_87": # add capability = "compute_87" # add if capability.startswith("compute_"):
Thanks for your reply, I tried "compute_87", but the error was "Unsupported gpu architecture 'compute_87". I then changed it to "compute_86" and the same problem occurred: terminate called after throwing an instance of 'std::runtime_error' what(): nvrtc: error: invalid value for --gpu-architecture (-arch)
ok, thanks for feedback, have you tried running the perception module or just prediction module have this error!
I will continue to confirm the issue, but it looks like it may be related to torch.
ok, thanks for feedback, have you tried running the perception module or just preception module have this error!
I will continue to confirm the issue, but it looks like it may be related to torch.
I haven't run the perception module yet. I want to start the prediction module through dreamview. But I found it wouldn't work. So i tried launching the prediction module alone with the following code. The result is the above error.
: mainboard -d /apollo/modules/prediction/dag/prediction.dag
I also think it's a torch problem, since the libtorch is compiled and provided by apollo, and we don't know on which nvidia gpu card they compiled it. Maybe you can try to build torch from source in apollo docker container on your own machine, I followed the following steps to compile and deploy libtorch in apollo docker container for jetson tx2: Following content describes how to build libtorch source on jetson tx2:
important here is you choose yourself the torch version you need, the TORCH_CUDA_ARCH_LIST which suits to your card, maybe use 86.
update torch to 1.8 maybe solve the problem. Torch needs to dynamically compile cuda files use nvrtc
In view of compatibility with CUDA11.1, the highest version of Torch is 1.10.
I also think it's a torch problem, since the libtorch is compiled and provided by apollo, and we don't know on which nvidia gpu card they compiled it. Maybe you can try to build torch from source in apollo docker container on your own machine, I followed the following steps to compile and deploy libtorch in apollo docker container for jetson tx2: Following content describes how to build libtorch source on jetson tx2:
- Downloading libtorch source: git clone --recursive --branch v1.11.0 http://github.com/pytorch/pytorch
- install py deps: pip3 install --no-cache-dir PyYAML typing
- set env : export TORCH_CUDA_ARCH_LIST="3.5;5.0;5.2;6.1;6.2" && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export PYTORCH_BUILD_NUMBER=1 && export BUILD_CAFFE2=1 && export USE_NCCL=0 && export PYTORCH_BUILD_VERSION=1.11.0 # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0
- set env for cpu support: export USE_CUDA=0 (4.1 set env for cpu support: export USE_CUDA=1)
- python3 setup.py install (5.1 python3 setup.py install)
- mkdir libtorch_cpu && cp -r include libtorch_cpu/ && cp -r lib libtorch_cpu/ && sudo mv libtorch_cpu /usr/local/ (6.1 mkdir libtorch_gpu && cp -r include libtorch_gpu/ && cp -r lib libtorch_gpu/ && sudo mv libtorch_gpu /usr/local/) Attention check which python3 version you seee, here I use python3.7
important here is you choose yourself the torch version you need, the TORCH_CUDA_ARCH_LIST which suits to your card, maybe use 86.
I tried the following method, but it seems that libtorch does not compile successfully, the specific information is as follows: ———————— Downloading libtorch source: git clone --recursive --branch v1.10.0 http://github.com/pytorch/pytorch #It's hard to download this way in China. I downloaded it manually install py deps: pip3 install --no-cache-dir PyYAML typing set env : export TORCH_CUDA_ARCH_LIST="3.5;5.0;5.2;6.1;6.2" && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export PYTORCH_BUILD_NUMBER=1 && export BUILD_CAFFE2=1 && export USE_NCCL=0 && export PYTORCH_BUILD_VERSION=1.10.0 # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0 set env for cpu support: export USE_CUDA=0 sudo python3 setup.py install #Without sudo will report an error mkdir libtorch_cpu && cp -r include libtorch_cpu/ && cp -r lib libtorch_cpu/ && sudo mv libtorch_cpu /usr/local/ —————————— The specific error message is as follows: —————————— [ps@in-dev-docker:/apollo/pytorch]$ sudo python3 setup.py install Building wheel torch-1.10.0a0+git36449ea -- Building version 1.10.0a0+git36449ea cmake --build . --target install --config Release -- -j 64 [ 0%] Built target clog [ 0%] Built target defs.bzl [ 0%] Built target pthreadpool ...... [100%] Built target torch_python [100%] Built target nnapi_backend Install the project... -- Install configuration: "Release" running install running build running build_py copying caffe2/proto/prof_dag_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto copying caffe2/proto/predictor_consts_pb2.py -> build/lib.linux-x86_64-3.6/caffe2/proto ....... writing manifest file 'torch.egg-info/SOURCES.txt' removing '/usr/local/lib/python3.6/dist-packages/torch-1.10.0a0+git36449ea-py3.6.egg-info' (and everything under it) Copying torch.egg-info to /usr/local/lib/python3.6/dist-packages/torch-1.10.0a0+git36449ea-py3.6.egg-info running install_scripts Installing convert-caffe2-to-onnx script to /usr/local/bin Installing convert-onnx-to-caffe2 script to /usr/local/bin Installing torchrun script to /usr/local/bin [ps@in-dev-docker:/apollo/pytorch]$ mkdir libtorch_cpu && cp -r include libtorch_cpu/ && cp -r lib libtorch_cpu/ && sudo mv libtorch_cpu /usr/local/ cp: cannot stat 'include': No such file or directory [ps@in-dev-docker:/apollo/pytorch]$
Hi I think the build/install succeeded.
/usr/local/
Thank you for your help. I copied the bin and include folders found in /usr/local/ to the libtorch_gpu folder and recompile apollo. The compilation was successful, but the prediction module could not be started. The error information is as follows:
[ps@in-dev-docker:/apollo]$ mainboard -d /apollo/modules/prediction/dag/prediction.dag WARNING: Logging before InitGoogleLogging() is written to STDERR I1031 21:15:28.325443 1690631 module_argument.cc:81] []command: mainboard -d /apollo/modules/prediction/dag/prediction.dag I1031 21:15:28.325830 1690631 global_data.cc:153] []host ip: 192.168.3.90 I1031 21:15:28.327428 1690631 module_argument.cc:57] []binaryname is mainboard, processgroup is mainboard_default, has 1 dag conf I1031 21:15:28.327440 1690631 module_argument.cc:60] []dag_conf: /apollo/modules/prediction/dag/prediction.dag E1031 21:15:28.339660 1690631 class_loader_utility.cc:218] [mainboard]LibraryLoadException: libc10.so: cannot open shared object file: No such file or directory E1031 21:15:28.339690 1690631 class_loader_utility.cc:234] [mainboard]shared library failed: /apollo/bazel-bin/modules/prediction/libprediction_component.so E1031 21:15:28.339710 1690631 class_loader_manager.h:70] [mainboard]Invalid class name: PredictionComponent E1031 21:15:28.339725 1690631 module_controller.cc:67] [mainboard]Failed to load module: /apollo/modules/prediction/dag/prediction.dag E1031 21:15:28.339735 1690631 class_loader_utility.cc:256] [mainboard]Attempt to UnloadLibrary lib, but can't find lib: /apollo/bazel-bin/modules/prediction/libprediction_component.so E1031 21:15:28.339745 1690631 mainboard.cc:39] [mainboard]module start error. [ps@in-dev-docker:/apollo]$
https://stackoverflow.com/questions/65710713/importerror-libc10-so-cannot-open-shared-object-file-no-such-file-or-director but at first, please make sure that you have deleted the original apollo-provided libtorch installation in apollo docker container in the dir of /usr/local/libtorch_gpu, which was installed https://github.com/ApolloAuto/apollo/blob/master/docker/build/installers/install_libtorch.sh. and then copy the torch you compiled yourself to the /usr/local/libtorch_gpu dir, afterwards run "ldconfig". I'm not sure if you did in this way for your last comment.
have deleted the original apollo-provided libtorch installation in apollo docker container in the dir of /usr/local/libtorch_gpu, which was installed
Yes, i have deleted the original apollo-provided libtorch installation in apollo docker container in the dir of /usr/local/libtorch_gpu, which was installed, and then copy the torch the compiled myself to the /usr/local/libtorch_gpu dir. But I don't understand how to run "ldconfig". I just run ". /apollo.sh build ".
Hi, I also encountered the same problem when running mainboard -d /apollo/modules/prediction/dag/prediction.dag
on RTX4090.
The error is:
terminate called after throwing an instance of 'std::runtime_error'
what(): nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" __global__
void func_1(float* t0, float* t1, float* aten_relu_flat) {
{
if (512 * blockIdx.x + threadIdx.x<2 ? 1 : 0) {
aten_relu_flat[512 * blockIdx.x + threadIdx.x] = (((__ldg(t0 + (512 * blockIdx.x + threadIdx.x) % 2)) + (__ldg(t1 + (512 * blockIdx.x + threadIdx.x) % 2))<0.f ? 1 : 0) ? 0.f : (__ldg(t0 + (512 * blockIdx.x + threadIdx.x) % 2)) + (__ldg(t1 + (512 * blockIdx.x + threadIdx.x) % 2)));
}
}
}
I also tried upadting /usr/local/libtorch_cpu
and /usr/local/libtorch_gpu
with the 1.8 version but it still doesn't work... Have you found solutions for this issue? Thanks!
@CesarLiu @lovelyzzc @WilliaJing @Azure-blog @gcx2020 We have released a new image that supports 4090 card. You can try below steps.
Nvidia driver version >=520.61.05. If the driver is smaller than the above version, it needs to be upgraded.
Modify VERSION_X86_64
image version in docker/scripts/dev_start.sh
VERSION_X86_64="dev-x86_64-18.04-20231128_2222"
Start docker and enter docker
bash docker/scripts/dev_start.sh
bash docker/scripts/dev_into.sh
Modify third_party/centerpoint_infer_op/workspace.bzl
as below
"""Loads the paddlelite library"""
# Sanitize a dependency so that it works correctly from code that includes
# Apollo as a submodule.
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
def clean_dep(dep):
return str(Label(dep))
def repo():
http_archive(
name = "centerpoint_infer_op-x86_64",
sha256 = "038470fc2e741ebc43aefe365fc23400bc162c1b4cbb74d8c8019f84f2498190",
strip_prefix = "centerpoint_infer_op",
urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op_cu118.tar.gz"],
)
http_archive(
name = "centerpoint_infer_op-aarch64",
sha256 = "e7c933db4237399980c5217fa6a81dff622b00e3a23f0a1deb859743f7977fc1",
strip_prefix = "centerpoint_infer_op",
urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op-linux-aarch64-1.0.0.tar.gz"],
)
Modify third_party/paddleinference/workspace.bzl
as below
"""Loads the paddlelite library"""
# Sanitize a dependency so that it works correctly from code that includes
# Apollo as a submodule.
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
def clean_dep(dep):
return str(Label(dep))
def repo():
http_archive(
name = "paddleinference-x86_64",
sha256 = "7498df1f9bbaf5580c289a67920eea1a975311764c4b12a62c93b33a081e7520",
strip_prefix = "paddleinference",
urls = ["https://apollo-pkg-beta.cdn.bcebos.com/archive/paddleinference-cu118-x86.tar.gz"],
)
http_archive(
name = "paddleinference-aarch64",
sha256 = "048d1d7799ffdd7bd8876e33bc68f28c3af911ff923c10b362340bd83ded04b3",
strip_prefix = "paddleinference",
urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/paddleinference-linux-aarch64-1.0.0.tar.gz"],
)
First check whether the .apollo.bazelrc
file exists in the workspace. If it exists, delete it first.
Disable the macro in modules/perception/common/inference/tensorrt/rt_legacy.h
// #ifdef __aarch64__
// #endif
build perception module
./apollo.sh build_opt_gpu perception
@daohu527
Thanks for your support ! And I successfully build all modules on my 4090 machine. But when I launch perception module, some errors occur:
cyber_launch start modules/perception/launch/perception_lidar.launch
[mainboard]Failed to get model path of center_point please check if model has been installed or APOLLO_MODEL_PATH
environment variable has been set correctly.
cyber_launch start modules/perception/launch/perception_trafficlight.launch
[mainboard]Failed to get model path of tl_detection_caffe please check if model has been installed or APOLLO_MODEL_PATH
environment variable has been set correctly.
cyber_launch start modules/perception/launch/perception_camera_3d.launch
[mainboard]Failed to get model path of smoke_torch please check if model has been installed or APOLLO_MODEL_PATH
environment variable has been set correctly.
cyber_launch start modules/perception/launch/perception_camera_2d.launch terminate with error
cyber_launch start modules/perception/launch/perception_lane.launch [lane ] E1205 11:27:36.111732 3032414 file.cc:115] [mainboard]File [perception/lane_detection/data/lane.pb.txt] does not exist! [lane ] E1205 11:27:36.111740 3032414 lane_detection_component.cc:62] [perception]Read config failed: perception/lane_detection/data/lane.pb.txt [lane ] E1205 11:27:36.111743 3032414 util.h:147] [perception]InitCameraFrames failed. [lane ] E1205 11:27:36.111747 3032414 component.h:155] [mainboard]Component Init() failed. [lane ] E1205 11:27:36.111804 3032414 module_controller.cc:69] [mainboard]Failed to load module: /apollo/modules/perception/lane_detection/dag/lane_detection.dag [lane ] E1205 11:27:36.111817 3032414 mainboard.cc:39] [mainboard]module start error.
only perception_radar.launch works. Any suggestions, please?
we solve one by one
@CesarLiu @lovelyzzc @WilliaJing @Azure-blog @gcx2020 We have released a new image that supports 4090 card. You can try below steps.
4090 card support
Nvidia driver version >=520.61.05. If the driver is smaller than the above version, it needs to be upgraded.
Replace docker image
Modify
VERSION_X86_64
image version indocker/scripts/dev_start.sh
VERSION_X86_64="dev-x86_64-18.04-20231128_2222"
Start docker and enter docker
bash docker/scripts/dev_start.sh bash docker/scripts/dev_into.sh
Modify third-party library download link
Modify
third_party/centerpoint_infer_op/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "centerpoint_infer_op-x86_64", sha256 = "038470fc2e741ebc43aefe365fc23400bc162c1b4cbb74d8c8019f84f2498190", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op_cu118.tar.gz"], ) http_archive( name = "centerpoint_infer_op-aarch64", sha256 = "e7c933db4237399980c5217fa6a81dff622b00e3a23f0a1deb859743f7977fc1", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op-linux-aarch64-1.0.0.tar.gz"], )
Modify
third_party/paddleinference/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "paddleinference-x86_64", sha256 = "7498df1f9bbaf5580c289a67920eea1a975311764c4b12a62c93b33a081e7520", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.cdn.bcebos.com/archive/paddleinference-cu118-x86.tar.gz"], ) http_archive( name = "paddleinference-aarch64", sha256 = "048d1d7799ffdd7bd8876e33bc68f28c3af911ff923c10b362340bd83ded04b3", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/paddleinference-linux-aarch64-1.0.0.tar.gz"], )
compile
First check whether the
.apollo.bazelrc
file exists in the workspace. If it exists, delete it first.Disable the macro in
modules/perception/common/inference/tensorrt/rt_legacy.h
// #ifdef __aarch64__ // #endif
build perception module
./apollo.sh build_opt_gpu perception
@CesarLiu @lovelyzzc @WilliaJing @Azure-blog @gcx2020 We have released a new image that supports 4090 card. You can try below steps.
4090 card support
Nvidia driver version >=520.61.05. If the driver is smaller than the above version, it needs to be upgraded.
Replace docker image
Modify
VERSION_X86_64
image version indocker/scripts/dev_start.sh
VERSION_X86_64="dev-x86_64-18.04-20231128_2222"
Start docker and enter docker
bash docker/scripts/dev_start.sh bash docker/scripts/dev_into.sh
Modify third-party library download link
Modify
third_party/centerpoint_infer_op/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "centerpoint_infer_op-x86_64", sha256 = "038470fc2e741ebc43aefe365fc23400bc162c1b4cbb74d8c8019f84f2498190", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op_cu118.tar.gz"], ) http_archive( name = "centerpoint_infer_op-aarch64", sha256 = "e7c933db4237399980c5217fa6a81dff622b00e3a23f0a1deb859743f7977fc1", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op-linux-aarch64-1.0.0.tar.gz"], )
Modify
third_party/paddleinference/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "paddleinference-x86_64", sha256 = "7498df1f9bbaf5580c289a67920eea1a975311764c4b12a62c93b33a081e7520", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.cdn.bcebos.com/archive/paddleinference-cu118-x86.tar.gz"], ) http_archive( name = "paddleinference-aarch64", sha256 = "048d1d7799ffdd7bd8876e33bc68f28c3af911ff923c10b362340bd83ded04b3", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/paddleinference-linux-aarch64-1.0.0.tar.gz"], )
compile
First check whether the
.apollo.bazelrc
file exists in the workspace. If it exists, delete it first.Disable the macro in
modules/perception/common/inference/tensorrt/rt_legacy.h
// #ifdef __aarch64__ // #endif
build perception module
./apollo.sh build_opt_gpu perception
when I complete the modification and build, an error occured as follows, could you help to fix this error? (16:31:09) ERROR: /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/external/local_config_tensorrt/BUILD:43:8: Executing genrule @local_config_tensorrt//:tensorrt_lib failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped) cp: cannot stat '/usr/lib/x86_64-linux-gnu/libnvinfer.so.7': No such file or directory (16:31:09) INFO: Elapsed time: 21.648s, Critical Path: 11.17s (16:31:09) INFO: 248 processes: 113 internal, 135 local. (16:31:09) FAILED: Build did NOT complete successfully
below info is shown in container as the output of nvidia-smi | NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 | | 0 NVIDIA GeForce RTX 4070 Off | 00000000:01:00.0 On | N/A |
@CesarLiu @lovelyzzc @WilliaJing @Azure-blog @gcx2020 We have released a new image that supports 4090 card. You can try below steps.
4090 card support
Nvidia driver version >=520.61.05. If the driver is smaller than the above version, it needs to be upgraded.
Replace docker image
Modify
VERSION_X86_64
image version indocker/scripts/dev_start.sh
VERSION_X86_64="dev-x86_64-18.04-20231128_2222"
Start docker and enter docker
bash docker/scripts/dev_start.sh bash docker/scripts/dev_into.sh
Modify third-party library download link
Modify
third_party/centerpoint_infer_op/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "centerpoint_infer_op-x86_64", sha256 = "038470fc2e741ebc43aefe365fc23400bc162c1b4cbb74d8c8019f84f2498190", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op_cu118.tar.gz"], ) http_archive( name = "centerpoint_infer_op-aarch64", sha256 = "e7c933db4237399980c5217fa6a81dff622b00e3a23f0a1deb859743f7977fc1", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op-linux-aarch64-1.0.0.tar.gz"], )
Modify
third_party/paddleinference/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "paddleinference-x86_64", sha256 = "7498df1f9bbaf5580c289a67920eea1a975311764c4b12a62c93b33a081e7520", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.cdn.bcebos.com/archive/paddleinference-cu118-x86.tar.gz"], ) http_archive( name = "paddleinference-aarch64", sha256 = "048d1d7799ffdd7bd8876e33bc68f28c3af911ff923c10b362340bd83ded04b3", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/paddleinference-linux-aarch64-1.0.0.tar.gz"], )
compile
First check whether the
.apollo.bazelrc
file exists in the workspace. If it exists, delete it first. Disable the macro inmodules/perception/common/inference/tensorrt/rt_legacy.h
// #ifdef __aarch64__ // #endif
build perception module
./apollo.sh build_opt_gpu perception
@CesarLiu @lovelyzzc @WilliaJing @Azure-blog @gcx2020 We have released a new image that supports 4090 card. You can try below steps.
4090 card support
Nvidia driver version >=520.61.05. If the driver is smaller than the above version, it needs to be upgraded.
Replace docker image
Modify
VERSION_X86_64
image version indocker/scripts/dev_start.sh
VERSION_X86_64="dev-x86_64-18.04-20231128_2222"
Start docker and enter docker
bash docker/scripts/dev_start.sh bash docker/scripts/dev_into.sh
Modify third-party library download link
Modify
third_party/centerpoint_infer_op/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "centerpoint_infer_op-x86_64", sha256 = "038470fc2e741ebc43aefe365fc23400bc162c1b4cbb74d8c8019f84f2498190", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op_cu118.tar.gz"], ) http_archive( name = "centerpoint_infer_op-aarch64", sha256 = "e7c933db4237399980c5217fa6a81dff622b00e3a23f0a1deb859743f7977fc1", strip_prefix = "centerpoint_infer_op", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/centerpoint_infer_op-linux-aarch64-1.0.0.tar.gz"], )
Modify
third_party/paddleinference/workspace.bzl
as below"""Loads the paddlelite library""" # Sanitize a dependency so that it works correctly from code that includes # Apollo as a submodule. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def clean_dep(dep): return str(Label(dep)) def repo(): http_archive( name = "paddleinference-x86_64", sha256 = "7498df1f9bbaf5580c289a67920eea1a975311764c4b12a62c93b33a081e7520", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.cdn.bcebos.com/archive/paddleinference-cu118-x86.tar.gz"], ) http_archive( name = "paddleinference-aarch64", sha256 = "048d1d7799ffdd7bd8876e33bc68f28c3af911ff923c10b362340bd83ded04b3", strip_prefix = "paddleinference", urls = ["https://apollo-pkg-beta.bj.bcebos.com/archive/paddleinference-linux-aarch64-1.0.0.tar.gz"], )
compile
First check whether the
.apollo.bazelrc
file exists in the workspace. If it exists, delete it first. Disable the macro inmodules/perception/common/inference/tensorrt/rt_legacy.h
// #ifdef __aarch64__ // #endif
build perception module
./apollo.sh build_opt_gpu perception
when I complete the modification and build, an error occured as follows, could you help to fix this error? (16:31:09) ERROR: /apollo/.cache/bazel/540135163923dd7d5820f3ee4b306b32/external/local_config_tensorrt/BUILD:43:8: Executing genrule @local_config_tensorrt//:tensorrt_lib failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped) cp: cannot stat '/usr/lib/x86_64-linux-gnu/libnvinfer.so.7': No such file or directory (16:31:09) INFO: Elapsed time: 21.648s, Critical Path: 11.17s (16:31:09) INFO: 248 processes: 113 internal, 135 local. (16:31:09) FAILED: Build did NOT complete successfully
below info is shown in container as the output of nvidia-smi | NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 | | 0 NVIDIA GeForce RTX 4070 Off | 00000000:01:00.0 On | N/A |
TF_TENSORRT_VERSION is not specified in container, and " config = find_cuda_config(repository_ctx, find_cuda_config_path, ["tensorrt"]) trt_version = config["tensorrt_version"] " gets trt_version = 7, but 8 is installed in contaner, you can fix it by export TF_TENSORRT_VERSION="8.5.2" before do build
(13:26:04) ERROR: /apollo/modules/perception/lidar/lib/detector/point_pillars_detection/BUILD:108:13: C++ compilation of rule '//modules/perception/lidar/lib/detector/point_pillars_detection:postprocess_cuda' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF ... (remaining 48 argument(s) skipped) nvcc fatal : Unsupported gpu architecture 'compute_89' (13:26:04) INFO: Elapsed time: 0.354s, Critical Path: 0.12s (13:26:04) INFO: 111 processes: 70 remote cache hit, 38 internal, 3 local. (13:26:04) FAILED: Build did NOT complete successfully