ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.01k stars 9.67k forks source link

prediction:dlopen: cannot load any more object with static TLS #8282

Closed gengqx closed 5 years ago

gengqx commented 5 years ago

if we only use libtorch, change WORKSPACE.in,prediction BUILD file,and scripts/apollo_base.sh to only to libtorch not libtorch gpu, it would got

cidi@in_dev_docker:/apollo$ bash scripts/navigation_prediction.sh start_fe
[cyber_launch_13369] INFO Launch file [/apollo/modules/prediction/launch/prediction.launch]
[cyber_launch_13369] INFO ========================================================================================================================
[cyber_launch_13369] INFO Load module [prediction] library: [prediction] [CYBER_DEFAULT] conf: [/apollo/modules/prediction/dag/prediction.dag] exception_handler: []
[cyber_launch_13369] INFO Start process [prediction] successfully. pid: 13370
[cyber_launch_13369] INFO ------------------------------------------------------------------------------------------------------------------------
[prediction]  WARNING: Logging before InitGoogleLogging() is written to STDERR
[prediction]  I0511 18:09:10.030285 13370 module_argument.cc:81] [] command: mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT 
[prediction]  I0511 18:09:10.030764 13370 global_data.cc:150] [] host ip: 172.16.33.74
[prediction]  I0511 18:09:10.031289 13370 module_argument.cc:57] [] binary_name_ is mainboard, process_group_ is prediction, has 1 dag conf
[prediction]  I0511 18:09:10.031297 13370 module_argument.cc:60] [] dag_conf: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0511 18:09:10.567720 13370 class_loader_utility.cc:224] [mainboard] poco LibraryLoadException: dlopen: cannot load any more object with static TLS
[prediction]  E0511 18:09:10.567802 13370 class_loader_utility.cc:240] [mainboard] poco sharedlibrary failed:/apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0511 18:09:10.567844 13370 class_loader_manager.h:71] [mainboard] Invalid class name: PredictionComponent
[prediction]  E0511 18:09:10.567878 13370 module_controller.cc:59] [mainboard] Failed to load module: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0511 18:09:10.567885 13370 class_loader_utility.cc:262] [mainboard] attempt to UnloadLibrary lib,but can't find lib /apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0511 18:09:10.567893 13370 mainboard.cc:43] [mainboard] module start error.
[prediction]  
[cyber_launch_13369] ERROR Process [prediction] has finished. [pid 13370, cmd mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT].
gengqx commented 5 years ago
cidi@in_dev_docker:/apollo$ ldd /apollo/bazel-bin/modules/prediction/libprediction_component.so | grep libtorch
    libtorch.so.1 => /usr/local/apollo/libtorch/lib/libtorch.so.1 (0x00007f3d3d9b4000)
    libtorch_python.so => /apollo/bazel-bin/modules/prediction/../../_solib_k8/_U@pytorch_S_S_Cpytorch___Uexternal_Spytorch_Slib/libtorch_python.so (0x00007f3d3d138000)
pengpingliang commented 5 years ago

@gengqx The newest source (no any modifies ) on master , the Prediction couldnot start up with the same error。 Have any resolution ? Thank you!

gengqx commented 5 years ago

Yes, no modifies also cause this problem. If you really need to start prediction module, you can checkout prediction modules to commit

ce694b5f7dce88a154b14f849316eafb249c9d62

it may be the conflict between libtorch and opencv,but I still get no solution to this. may be the newest commit

c1c163253c4067b8aa9e2cdf86319418da0c9da6

have solve this issue by flags no use sematic_map. I have not test it yet.

natashadsouza commented 5 years ago

@gengqx we have not seen this issue internally and are currently working on reproducing and resolving it. Apologies for the delay and requesting your patience on the same.

gengqx commented 5 years ago

still got the problem.

gengqx@in_dev_docker:/apollo$ bash scripts/prediction.sh start_fe
[cyber_launch_2074] INFO Launch file [/apollo/modules/prediction/launch/prediction.launch]
[cyber_launch_2074] INFO ========================================================================================================================
[cyber_launch_2074] INFO Load module [prediction] library: [prediction] [CYBER_DEFAULT] conf: [/apollo/modules/prediction/dag/prediction.dag] exception_handler: []
[cyber_launch_2074] INFO Start process [prediction] successfully. pid: 2075
[cyber_launch_2074] INFO ------------------------------------------------------------------------------------------------------------------------
[prediction]  WARNING: Logging before InitGoogleLogging() is written to STDERR
[prediction]  I0521 17:12:41.420478  2075 module_argument.cc:81] [] command: mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT 
[prediction]  I0521 17:12:41.429838  2075 global_data.cc:150] [] host ip: 172.16.33.131
[prediction]  I0521 17:12:41.430306  2075 module_argument.cc:57] [] binary_name_ is mainboard, process_group_ is prediction, has 1 dag conf
[prediction]  I0521 17:12:41.430486  2075 module_argument.cc:60] [] dag_conf: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0521 17:12:43.015697  2075 class_loader_utility.cc:224] [mainboard] poco LibraryLoadException: dlopen: cannot load any more object with static TLS
[prediction]  E0521 17:12:43.016031  2075 class_loader_utility.cc:240] [mainboard] poco sharedlibrary failed:/apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0521 17:12:43.016427  2075 class_loader_manager.h:71] [mainboard] Invalid class name: PredictionComponent
[prediction]  E0521 17:12:43.016818  2075 module_controller.cc:59] [mainboard] Failed to load module: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0521 17:12:43.017138  2075 class_loader_utility.cc:262] [mainboard] attempt to UnloadLibrary lib,but can't find lib /apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0521 17:12:43.017462  2075 mainboard.cc:43] [mainboard] module start error.
[prediction]  
[cyber_launch_2074] ERROR Process [prediction] has finished. [pid 2075, cmd mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT].
[cyber_launch_2074] INFO All processes has died.
[cyber_launch_2074] INFO Cyber exit.
[cyber_launch_2074] INFO All processes have been stopped.
gengqx@in_dev_docker:/apollo$ git log
commit 5832b7eb7d1bb0e800b3390e066c7b1ca98a6ac5
Author: JasonZhou404 <zhoujiny404@gmail.com>
Date:   Mon May 20 18:54:45 2019 -0700

    Planning: fix duplicate conf in planning.conf

commit 1c4191143cc9785c0a16aa6544d4759350413f45
Author: panjiacheng <panjiacheng@yahoo.com>
Date:   Mon May 20 18:17:13 2019 -0700

    Prediction: reduce feature.proto generation time.

commit 61eae4091d0321bcf8130444327e3aadf73a07f0
Author: panjiacheng <panjiacheng@yahoo.com>
Date:   Mon May 20 16:42:12 2019 -0700
natashadsouza commented 5 years ago

@gengqx are you building Apollo using our docker environment? Also do you have any tools that are potentially causing a conflict? We have not been able to reproduce this particular issue on our end.

gengqx commented 5 years ago

totally all the same code and docker.

docker ps
CONTAINER ID        IMAGE                                                            COMMAND             CREATED             STATUS              PORTS               NAMES
f0f1451ab01e        apolloauto/apollo:dev-x86_64-20190413_1615                       "/bin/bash"         15 minutes ago      Up 15 minutes                           apollo_dev_gengqx
44e6db6da215        apolloauto/apollo:localization_volume-x86_64-latest              "/bin/sh"           15 minutes ago      Up 15 minutes                           apollo_localization_volume_gengqx
e452be2d428a        apolloauto/apollo:yolo3d_volume-x86_64-latest                    "/bin/sh"           15 minutes ago      Up 15 minutes                           apollo_yolo3d_volume_gengqx
ff5829ccafc4        apolloauto/apollo:map_volume-san_mateo-latest                    "/bin/sh"           15 minutes ago      Up 15 minutes                           apollo_map_volume-san_mateo_gengqx
a99bb6bc7a4b        apolloauto/apollo:map_volume-sunnyvale_with_two_offices-latest   "/bin/sh"           15 minutes ago      Up 15 minutes                           apollo_map_volume-sunnyvale_with_two_offices_gengqx
0f8b08956963        apolloauto/apollo:map_volume-sunnyvale_loop-latest               "/bin/bash"         15 minutes ago      Up 15 minutes                           apollo_map_volume-sunnyvale_loop_gengqx
764de3647f80        apolloauto/apollo:map_volume-sunnyvale_big_loop-latest           "/bin/sh"           15 minutes ago      Up 15 minutes                           apollo_map_volume-sunnyvale_big_loop_gengqx

git log

commit 15965e5bcd9ea51bc780879fbbacd652dc736fa6
Author: Yifei Jiang <jiangyifei@gmail.com>
Date:   Wed May 22 17:35:31 2019 -0700

    tools: updated planning metrics.

commit 2df2bacd5b9e90522328d69f070ec1c8134e6b87
Author: JasonZhou404 <zhoujiny404@gmail.com>
Date:   Wed May 22 18:13:43 2019 -0700

    Planning: implementations of iterative anchoring smoother

commit 367f858ef514726e7c3028b3ce786c6725f70069
Author: JasonZhou404 <zhoujiny404@gmail.com>
Date:   Wed May 22 15:42:51 2019 -0700

    Planning: implementations of path smooth in iterative path smoothing

commit 486243a401a98472d3cbc80ee8a10dae9df8c8a6
Author: JasonZhou404 <zhoujiny404@gmail.com>
Date:   Wed May 22 13:55:13 2019 -0700

    Planning: apply discrete point math to reference line and iterative anchoring smoother

commit 53d41687df4db34083c671f1093ce0dff9c2543e
Author: JasonZhou404 <zhoujiny404@gmail.com>
Date:   Wed May 22 13:07:56 2019 -0700

    Planning: add discrete point math in planning math

git status

On branch apollo_master
Your branch is up-to-date with 'upstream/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    log/

nvidia-docker version

NVIDIA Docker: 1.0.1

Client:
 Version:           18.06.2-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        6d37f41
 Built:             Sun Feb 10 03:48:04 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.2-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       6d37f41
  Built:            Sun Feb 10 03:46:30 2019
  OS/Arch:          linux/amd64
  Experimental:     false

result:

gengqx@in_dev_docker:/apollo$ bash scripts/prediction.sh start_fe
[cyber_launch_27932] INFO Launch file [/apollo/modules/prediction/launch/prediction.launch]
[cyber_launch_27932] INFO ========================================================================================================================
[cyber_launch_27932] INFO Load module [prediction] library: [prediction] [CYBER_DEFAULT] conf: [/apollo/modules/prediction/dag/prediction.dag] exception_handler: []
[cyber_launch_27932] INFO Start process [prediction] successfully. pid: 27933
[cyber_launch_27932] INFO ------------------------------------------------------------------------------------------------------------------------
[prediction]  WARNING: Logging before InitGoogleLogging() is written to STDERR
[prediction]  I0523 11:50:50.675398 27933 module_argument.cc:81] [] command: mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT 
[prediction]  I0523 11:50:50.675753 27933 global_data.cc:150] [] host ip: 172.16.33.131
[prediction]  I0523 11:50:50.675961 27933 module_argument.cc:57] [] binary_name_ is mainboard, process_group_ is prediction, has 1 dag conf
[prediction]  I0523 11:50:50.675969 27933 module_argument.cc:60] [] dag_conf: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0523 11:50:51.195686 27933 class_loader_utility.cc:224] [mainboard] poco LibraryLoadException: dlopen: cannot load any more object with static TLS
[prediction]  E0523 11:50:51.195765 27933 class_loader_utility.cc:240] [mainboard] poco sharedlibrary failed:/apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0523 11:50:51.195817 27933 class_loader_manager.h:70] [mainboard] Invalid class name: PredictionComponent
[prediction]  E0523 11:50:51.195832 27933 module_controller.cc:59] [mainboard] Failed to load module: /apollo/modules/prediction/dag/prediction.dag
[prediction]  E0523 11:50:51.195852 27933 class_loader_utility.cc:262] [mainboard] attempt to UnloadLibrary lib,but can't find lib /apollo/bazel-bin/modules/prediction/libprediction_component.so
[prediction]  E0523 11:50:51.195873 27933 mainboard.cc:43] [mainboard] module start error.
[prediction]  
[cyber_launch_27932] ERROR Process [prediction] has finished. [pid 27933, cmd mainboard -d /apollo/modules/prediction/dag/prediction.dag -p prediction -s CYBER_DEFAULT].
[cyber_launch_27932] INFO All processes has died.
[cyber_launch_27932] INFO Cyber exit.
[cyber_launch_27932] INFO All processes have been stopped.
ll /apollo/bazel-bin/modules/prediction/libprediction_component.so
-r-xr-xr-x 1 gengqx gengqx 28035 May 23 11:43 /apollo/bazel-bin/modules/prediction/libprediction_component.so*
kkhitsko commented 5 years ago

@gengqx, @natashadsouza I have the same error.

natashadsouza commented 5 years ago

@gengqx thank you for sending this over. Our engineers are working on it currently and will have a fix soon. Please follow the Ubuntu 18 branch for updates. Once fixed I will also follow up on this thread. Thanks!

HongyiSun commented 5 years ago

Static TLS issue is fixed now, sorry for the inconvenience and please try again now.

ljie-PI commented 4 years ago

@HongyiSun how did you fix it?

kechxu commented 4 years ago

@HongyiSun how did you fix it?

This issue has been fixed along with ubuntu 18.04 upgrade.

daohu527 commented 3 years ago

@HongyiSun how did you fix it?

This issue has been fixed along with ubuntu 18.04 upgrade.

@HongyiSun @kechxu I already use 18.04, but the same problem, could you pls point out how to fix it