ethz-asl / ethzasl_icp_mapping

3D mapping tools for robotic applications
271 stars 156 forks source link

libgomp: Thread creation failed: Resource temporarily unavailable #61

Closed Alexma3312 closed 6 years ago

Alexma3312 commented 6 years ago

Hi,

The process died by itself(exit code 1)after I ran the Kingfisher.launch about 30mins. And the reason is caused by this thread creation failed. I tried to relaunch the launch file, the launch file continuous to work. But after about 30mins, the process died with the same reason.

I am in reintegrate/master_into_indigo_devel branch, and the launch file path is ethzasl_icp_mapping/ethzasl_icp_mapper/launch/kingfisher/kingfisher.launch. How should I solve this problem? thx!

HannesSommer commented 6 years ago

I'd assume this is caused by libnabo. If you don't need maximal speed you could try to compile libnabo without gomp and try again. There is a cmake flag making this easy : https://github.com/ethz-asl/libnabo/blob/master/CMakeLists.txt#L89 .

Alexma3312 commented 6 years ago

Thank you for the respond, however, I changed the variable in the flag to false and turned off the USE_OPEN_MP, the process still died, with a new issue

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >' what(): boost::thread_resource_error: Resource temporarily unavailable (with an exit code -6)

Can you give me some help on this?

HannesSommer commented 6 years ago

Hm, this sounds like too many threads get created in your executable and libnabo is not actually causing the issue. But for debugging I'd still keeping OPEN_MP deactivated for libnabo.

Can you get an stacktrace for that exception? Then we have a new candidate for who is missing to reuse or close threads. I just start to remember that there is a known bug in this library missing to join threads. Apparently the fix is not in this branch. @kruesip , do you remember this (boost threads not cleaning up on themselves)? And were you've put the solution?

kruesip commented 6 years ago

I vaguely remember having had this problem, but unfortunately forgot how and where we solved it. All I know is that I was using the branch indigo_devel (not reintegrate/master_into_indigo_devel). But I haven't used the library for quite a while, so don't know if that is still working.

HannesSommer commented 6 years ago

Thanks @kruesip ! But I could not find any existing solution.

I believe I found the bug in these two lines: https://github.com/ethz-asl/ethzasl_icp_mapping/blob/877a67471ab0b0d91b6a40e923a11e25c5306259/ethzasl_icp_mapper/src/mapper.cpp#L480 https://github.com/ethz-asl/ethzasl_icp_mapping/blob/877a67471ab0b0d91b6a40e923a11e25c5306259/ethzasl_icp_mapper/src/dynamic_mapper.cpp#L538

It is quite possible that these lines create more and more detached threads till you hit the maximum.

Right now I don't have the resources to fix this. If @Alexma3312 you know how to fix this (you have to join the threads before forgetting about them by assigning a new thread; alternatively one could recycle a single thread over and over -> worker) and have the time to do it, I would be happy to review a PR.

Alexma3312 commented 6 years ago

Thank you, I hope to fix this problem. I understand your suggestion of the solution, however, will it be easier to set the maximum of threads as infinite?

On the other hand, will it help if I could provide the stacktrace? How can I provide the stacktrace?

HannesSommer commented 6 years ago

I don't think the stack trace will provide new insights. Infinite threads are not possible. A thread is consuming resources from the kernel (even it its entry function returned). Of course you might be able to solve your issue by raising it enough (https://stackoverflow.com/a/344292), though.

cedricpradalier commented 6 years ago

Linking with issue #64, I think GOMP is not responsible here, but just happens to fail because the number of non-joined threads created by them main program is going to infinity. I'd mark it as resolved once #64 is implemented in mapper and dynamic_mapper.

HannesSommer commented 6 years ago

This should be solved with PR #65. Please reopen if not.