WebPlatformForEmbedded / WPEWebKit

WPE WebKit port (downstream)
210 stars 135 forks source link

Bmalloc crashes #1342

Open modeveci opened 1 month ago

modeveci commented 1 month ago

Sometimes BMScavenger crashes with a SIGSEGV in __pthread_cond_timedwait after WebKitBrowserPlugin implementation is unloaded, as it retains an invalidated condition variable.

The crashed thread callstack usually looks like:

5|0|libpthread-2.31.so|__condvar_dec_grefs|/usr/src/debug/glibc/2.31+gitAUTOINC+1094741224-r0/git/nptl/pthread_cond_wait.c|153|0x0
5|1|libpthread-2.31.so|__pthread_cond_timedwait|/usr/src/debug/glibc/2.31+gitAUTOINC+1094741224-r0/git/nptl/pthread_cond_wait.c|530|0x3
5|2|||||0xb21504e9
5|3|libstdc++.so.6.0.28|std::execute_native_thread_routine|/usr/src/debug/gcc-runtime/9.3.0-r0/arm-rdk-linux-gnueabi/libstdc++-v3/src/c++11/../../../../../../../../../../work-shared/gcc-9.3.0-r0/gcc-9.3.0/libstdc++-v3/src/c++11/thread.cc|80|0x3
5|4|libpthread-2.31.so|start_thread|/usr/src/debug/glibc/2.31+gitAUTOINC+1094741224-r0/git/nptl/pthread_create.c|477|0x1

Browser plugin is unloaded by Thunder arch. which is underneath triggering dlclose.

modeveci commented 1 month ago

Some more feedback:

When Thunder framework deactives plugin, what will happen?

This happens when there is a switch between lighting and html apps. WPEProcess receives SIGTERM from WPEFramework. This leads to unloading the module.

As the bmscavenger is not designed to work in such use case a race condition can happen. Usually it happens in case of a high system load.

Crash happens here: https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/nptl/pthread_cond_wait.c#L153 The crash can be reproduced by launching an html app and a lightning app one after another.

The condition is invalidated after the library is unloaded, but bmalloc's BMScavenger attempts to operate with it even though it is already invalid.

magomez commented 1 month ago

@modeveci could you point me to the place in the WPEFramework code that's performlng the dlopen/dlclose of the library?

modeveci commented 1 month ago

as stated in the comment, thunder framework "deinitialize" the browser plugin and WPEBrowser receives SIGTERM and for graceful termination. I am not sure if there are explicit callings of dl library. That was a comment I have found when searching this problem.

magomez commented 1 month ago

@modeveci could you give a try to the patch I'm attaching? I haven't been able to replicate the crash, but I think this patch may fix the problem by waiting for the proper termination of the scavenger thread.

1342.patch.txt

modeveci commented 1 month ago

@magomez @emutavchi shared this feedback for the patch, can you confirm whether there is an explicit calling of destruction or how it handles?

I don’t think above will work. Scavenger is constructed on demand in static storage memory with a placement new, see StaticPerProcess::getSlowCase. I don’t think Scavenger destructor will ever be called, unless something is doing it explicitly.

magomez commented 1 month ago

@magomez @emutavchi shared this feedback for the patch, can you confirm whether there is an explicit calling of destruction or how it handles?

I don’t think above will work. Scavenger is constructed on demand in static storage memory with a placement new, see StaticPerProcess::getSlowCase. I don’t think Scavenger destructor will ever be called, unless something is doing it explicitly.

I'm not sure whether the destructor is called or not, I was working under the assumption that it was. If it's not called, I'll have to modify the patch so someone else stops the thread.

magomez commented 3 weeks ago

As indeed the destructor is not being called in this situation due to the complex way used to create the instance, I've crafted a new patch that uses std::atexit to request the termination of the Scavenger thread when the application exits. On my environment I can see the thread properly terminated in all the processes. Could you give it a try, please, and tell me whether it fixes the problem for you?

1342-atexit.patch.txt