envire / envire-envire_core

Core part for the Environment Representation library
BSD 2-Clause "Simplified" License
7 stars 13 forks source link

In debug mode an assertion in class loader fails when destoying the class loader #39

Closed Rauldg closed 5 years ago

Rauldg commented 5 years ago

Discovered trying to fix this other issue

The same minimal example, but compiled in debug mode creates this behavior:

At https://github.com/ros/class_loader/blob/0.3.7/src/class_loader_core.cpp#L497

It triggers:

install/include/boost/thread/pthread/recursive_mutex.hpp:113: void boost::recursive_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.
arneboe commented 5 years ago

I am unable to reproduce the bug. What boost version are you using? Can you upload a complete working example that I can use to test? Maybe there is something different between your minimal example implementation and mine.

arneboe commented 5 years ago

ok, now am able to reproduce it :) Mistake on my side

arneboe commented 5 years ago

This happens because pthread_mutex_lock returns EINVAL during tear down of the application. According to the man page there are two reasons why that might happen:

I don't think that the second reason is very likely. It seems like the mutex is accessed after it has been destroyed. I have no idea how/why though

arneboe commented 5 years ago

Ok, I poked at it some more:

all boost::recursive_locks are destroyed by the libc exit_handler. After that happens the class_loader tries to lock them. I do not understand why libc calls the recursive_lock destructor, though. It shouldn't....

arneboe commented 5 years ago

Found it:

This is in class_loader_core.cpp

boost::recursive_mutex & getLoadedLibraryVectorMutex()
/*****************************************************************************/
{
  static boost::recursive_mutex m;
  return m;
}

The mutex is static local while the class_loader is static. The class_loader is created before the mutex. Thus the mutex is destroyed before the class_loader. The class_loader tries to access the mutex during destruction and obviously fails. This is something that should be fixed upstream (might very well be, we are on a really old version of the class_loader).

Rauldg commented 5 years ago

I don't think they changed this in their newer versions.

See: https://github.com/ros/class_loader/blob/melodic-devel/src/class_loader_core.cpp#L48

Actually, there is some work going on trying to improve this but is an open issue: https://github.com/ros/class_loader/issues/33

arneboe commented 5 years ago

Ok this is more complex than I though.

This is not an upstream bug. The upstream class_loader works fine as long as it is not statically allocated.

What happens is:

(1) the class_loader is statically allocated (because we use base::Singleton) (2) upon first usage of the class_loader its internal mutexes are statically allocated. (3) after the main() ends the static memory is freed in the inverse order of creation, thus the mutexes are deleted first. (4) now the class_loader is deleted, however inside the destructor it tries to access the mutexes and crashes because they have already been deleted.

My solution would be to manually delete the class_loader pointer at the end of main. I can do it in a hacky way for now but ultimately this needs an extension to base::Singleton

arneboe commented 5 years ago

The bug has been fixed in branch https://github.com/envire/envire-envire_core/tree/fix-classLoader

I had to add a feature to base-logging thus for now this fork of base-logging is required: https://github.com/envire/base-logging/tree/feature-add-destroy

saarnold commented 5 years ago

Thanks for investigating this issue! :+1:

arneboe commented 5 years ago

We should document somewhere that one needs to manually destroy the ClassLoader before the program ends, I guess?