ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.11k stars 9.7k forks source link

[PERCEPTION] Motion service immediately segfaults. Does camera-based perception work in current master branch? #14279

Closed josh-wende closed 2 years ago

josh-wende commented 2 years ago

Hi,

I have been working with the master branch of Apollo in the SVL simulator. Lidar-based perception works great, but when I try to launch camera-based perception from /apollo/modules/perception/production/launch/perception_camera.launch it fails. When I launch from each dag file separately, I see that mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_camera.dag works fine, but mainboard -d /apollo/modules/perception/production/dag/dag_motion_service.dag produces a segmentation fault the moment that I press play on the simulation.

I know that camera-based perception does not work in the r6.0.0 release, but I thought that it worked now in the master branch, according to some other issues I'd read here. Is this correct, or does it still not work as-is? Is there a fix I could implement myself?

Thanks in advance for any help.

./apollo.sh config output:

[INFO] Apollo Environment Settings: [INFO] APOLLO_ROOT_DIR: /apollo [INFO] APOLLO_CACHE_DIR: /apollo/.cache [INFO] APOLLO_IN_DOCKER: true [INFO] APOLLO_VERSION: master-2021-12-28-463fb82f9e [INFO] DOCKER_IMG: dev-x86_64-18.04-20210914_1336 [INFO] APOLLO_ENV: STAGE=dev USE_ESD_CAN=false [INFO] USE_GPU: USE_GPU_HOST=1 USE_GPU_TARGET=1

daohu527 commented 2 years ago

Is there any error message, or coredump file. can you provide more detailed error message

josh-wende commented 2 years ago

Hi @daohu527, sorry for the long delay, I was not able to work on this problem for a while.

So after some debugging with gdb it looks like I am having the same issue that you once had with builder_->buildCudaEngine(*network_) on line 783 of rt_net.cc returning nullptr. Did you ever solve this? The only other thing that looks like it might be a problem is that calibrator_ is still nullptr after the line builder_->setInt8Calibrator(calibrator_), but int8_mode = false so I think this might be okay.

This segfault happens from the dag_streaming_perception_camera.dag process. The dag_motion_service.dag process segfaults sometimes too (it is inconsistent), but I haven't looked into that as much yet.

josh-wende commented 2 years ago

The inconsistent crashes, which can vary when I add in print statements, makes it feel like a race condition.

daohu527 commented 2 years ago

You can ref to https://github.com/NVIDIA/TensorRT/issues/851 for detail message.

Print out the error message of tensorrt and then find the problem

josh-wende commented 2 years ago

@daohu527 It turned out to be an issue with Eigen objects that are members of classes or in STL containers being misaligned in memory, along with an iterator being invalidated in Cipv::CollectDrops. Not sure why I had this issue while you and others apparently don't, but it's resolved now.