WebPlatformForEmbedded / WPEWebKit

WPE WebKit port (downstream)
210 stars 135 forks source link

[wpe-2.38] UI Process gets into deadlock when WPEWebProcess was unresponsive for extended period #1345

Open varumugam123 opened 1 month ago

varumugam123 commented 1 month ago

This issue was observed with wpe-2.38 (6448608056) + WebKitBrowser implementation. If the WPEWebProcess is made hung by running a busy loop from RWI console (or any other equivalent means). This will trigger issuing SIGFPE to WPEWebProcess after configured unresponsive timeout (i.e watchdoghangthresholdtinseconds).

Below is the stack trace of the main() function thread as well as the Core::Thread that runs g_main_loop_run() for WebKit. Before Crash

Thread 1 (Thread 8293.8293 "HtmlApp-0"):
#0  __libc_do_syscall () at libc-do-syscall.S:48
#1  0xf7baa37e in futex_wait_cancelable (private=0, expected=0, futex_word=0x3271d4) at ../sysdeps/nptl/futex-internal.h:183
#2  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x32718c, cond=0x3271a8) at pthread_cond_wait.c:508
#3  __pthread_cond_wait (cond=cond@entry=0x3271a8, mutex=mutex@entry=0x32718c) at pthread_cond_wait.c:638
#4  0xf7b4c36e in WPEFramework::Core::Event::Lock (this=this@entry=0x327188) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Sync.cpp:860
#5  0xf7b4c39e in WPEFramework::Core::Event::Lock (this=this@entry=0x327188, nTime=nTime@entry=4294967295) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Sync.cpp:901
#6  0x00022376 in WPEFramework::Core::StateTrigger<WPEFramework::Core::QueueType<WPEFramework::Core::ProxyType<WPEFramework::Core::IDispatch> >::enumQueueState>::WaitState (this=this@entry=0x327168, a_State=a_State@entry=14, a_Time=a_Time@entry=4294967295) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/StateTrigger.h:118
#7  0x00022562 in WPEFramework::Core::QueueType<WPEFramework::Core::ProxyType<WPEFramework::Core::IDispatch> >::Extract (a_WaitTime=4294967295, a_Result=..., this=0x327158) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/Queue.h:204
#8  WPEFramework::Core::ThreadPool::Minion::Process (this=this@entry=0x327218) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/ThreadPool.h:426
#9  0x00022730 in WPEFramework::Core::WorkerPool::Join (this=this@entry=0x327148) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/WorkerPool.h:380
#10 0x0002291a in WPEFramework::Process::WorkerPoolImplementation::Run (this=0x327140) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:138
#11 WPEFramework::Process::ProcessFlow::Run (this=this@entry=0xff9720e0, pathName="/usr/lib/wpeframework/proxystubs/", interfaceId=<optimized out>, base=base@entry=0x348ab4, sequenceId=36) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:508
#12 0x0001d9da in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:676

Thread 5 (Thread 8293.8304 "HtmlApp-0"):
#0  __libc_do_syscall () at libc-do-syscall.S:48
#1  0xf799b6a6 in __GI___poll (timeout=5872, nfds=10, fds=0xf040d4d0) at ../sysdeps/unix/sysv/linux/poll.c:29
#2  __GI___poll (fds=0xf040d4d0, nfds=10, timeout=5872) at ../sysdeps/unix/sysv/linux/poll.c:26
#3  0xf3cf6992 in g_main_context_poll (priority=<optimized out>, n_fds=10, fds=0xf040d4d0, timeout=5872, context=0xf0401258) at ../glib-2.62.4/glib/gmain.c:4257
#4  g_main_context_iterate (context=0xf0401258, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib-2.62.4/glib/gmain.c:3953
#5  0xf3cf6d08 in g_main_loop_run (loop=0xf0403118) at ../glib-2.62.4/glib/gmain.c:4152
#6  0xf668b804 in WPEFramework::Plugin::WebKitImplementation::Worker (this=0x3489c0) at /usr/src/debug/lib32-webkitbrowser-plugin/3.0+gitAUTOINC+a4c646757b-r1/git/WebKitBrowser/WebKitImplementation.cpp:3053
#7  0xf7b4dc76 in WPEFramework::Core::Thread::StartThread (cClassPointer=0x3489c0) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Thread.cpp:194
#8  0xf7ba579e in start_thread (arg=0xcce9a418) at pthread_create.c:477
#9  0xf79a11ac in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from ./lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

After the steps I outlined to reproduce the issue, below are the states of those two threads

Thread 1 (Thread 8293.8293 "HtmlApp-0"):
#0  __libc_do_syscall () at libc-do-syscall.S:48
#1  0xf7ba65fe in __pthread_clockjoin_ex (threadid=4053709776, thread_return=thread_return@entry=0xff971bf4, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:145
#2  0xf7ba64ce in __pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0xff971bf4) at pthread_join.c:24
#3  0xf7b4dd06 in WPEFramework::Core::Thread::Terminate (this=0x3489c0) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Thread.cpp:266
#4  WPEFramework::Core::Thread::Terminate (this=0x3489c0) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Thread.cpp:240
#5  0xf7b4dd22 in WPEFramework::Core::Thread::~Thread (this=0x3489c0, __in_chrg=<optimized out>) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/Thread.cpp:117
#6  0xf6688ab8 in WPEFramework::Plugin::WebKitImplementation::~WebKitImplementation (this=this@entry=0x3489c0, __in_chrg=__in_chrg@entry=0, __vtt_parm=<optimized out>) at /usr/src/debug/lib32-webkitbrowser-plugin/3.0+gitAUTOINC+a4c646757b-r1/git/WebKitBrowser/WebKitImplementation.cpp:919
#7  0xf6688b1c in WPEFramework::Plugin::WebKitImplementation::~WebKitImplementation (this=this@entry=0x3489c0, __vtt_parm=<optimized out>) at /usr/src/debug/lib32-webkitbrowser-plugin/3.0+gitAUTOINC+a4c646757b-r1/git/WebKitBrowser/WebKitImplementation.cpp:937
#8  0xf668cc9e in WPEFramework::Core::Service<WPEFramework::Plugin::WebKitImplementation>::~Service (this=this@entry=0x3489c0, __in_chrg=__in_chrg@entry=0, __vtt_parm=0xf66a59c8 <VTT for WPEFramework::Core::ProxyObject<WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation> >+8>) at /usr/include/WPEFramework/core/Services.h:181
#9  0xf668cd00 in WPEFramework::Core::Service<WPEFramework::Plugin::WebKitImplementation>::~Service (this=this@entry=0x3489c0, __vtt_parm=<optimized out>) at /usr/include/WPEFramework/core/Services.h:181
#10 0xf668cd70 in WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation>::~ServiceImplementation (this=this@entry=0x3489c0, __in_chrg=__in_chrg@entry=0, __vtt_parm=0xf66a59c4 <VTT for WPEFramework::Core::ProxyObject<WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation> >+4>) at /usr/include/WPEFramework/core/Services.h:41
#11 0xf668cdc0 in WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation>::~ServiceImplementation (this=this@entry=0x3489c0, __vtt_parm=<optimized out>) at /usr/include/WPEFramework/core/Services.h:261
#12 0xf668ce22 in WPEFramework::Core::ProxyObject<WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation> >::~ProxyObject (this=this@entry=0x3489c0, __in_chrg=__in_chrg@entry=2, __vtt_parm=__vtt_parm@entry=0x0) at /usr/include/WPEFramework/core/Proxy.h:246
#13 0xf668ce6c in WPEFramework::Core::ProxyObject<WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation> >::~ProxyObject (this=this@entry=0x3489c0) at /usr/include/WPEFramework/core/Proxy.h:107
#14 0xf668cea8 in WPEFramework::Core::ProxyObject<WPEFramework::Core::ServiceMetadata<WPEFramework::Plugin::WebKitImplementation>::ServiceImplementation<WPEFramework::Plugin::WebKitImplementation> >::~ProxyObject (this=0x3489c0) at /usr/include/WPEFramework/core/Proxy.h:118
#15 0xf667dbec in WPEFramework::Plugin::CloseDown () at /usr/src/debug/lib32-webkitbrowser-plugin/3.0+gitAUTOINC+a4c646757b-r1/git/WebKitBrowser/WebKitImplementation.cpp:367
#16 0xf7945260 in __run_exit_handlers (status=status@entry=1, listp=0xf79f132c <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:91
#17 0xf79452fe in __GI_exit (status=status@entry=1) at exit.c:139
#18 0xf667e7da in WPEFramework::Plugin::WebKitImplementation::postExitJob()::ExitJob::Dispatch() (this=<optimized out>) at /usr/src/debug/lib32-webkitbrowser-plugin/3.0+gitAUTOINC+a4c646757b-r1/git/WebKitBrowser/WebKitImplementation.cpp:2691
#19 0x000225fe in WPEFramework::Core::ThreadPool::Minion::Process (this=0xf79f4450 <__exit_funcs_lock>, this@entry=0x327218) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/ThreadPool.h:451
#20 0x00022730 in WPEFramework::Core::WorkerPool::Join (this=this@entry=0x327148) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/core/../core/WorkerPool.h:380
#21 0x0002291a in WPEFramework::Process::WorkerPoolImplementation::Run (this=0x327140) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:138
#22 WPEFramework::Process::ProcessFlow::Run (this=this@entry=0xff9720e0, pathName="/usr/lib/wpeframework/proxystubs/", interfaceId=<optimized out>, base=base@entry=0x348ab4, sequenceId=36) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:508
#23 0x0001d9da in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/lib32-wpeframework/4.2+gitAUTOINC+8fa0abb34e-r0/git/Source/WPEProcess/Process.cpp:676

Thread 5 (Thread 8293.8304 "HtmlApp-0"):
#0  __libc_do_syscall () at libc-do-syscall.S:48
#1  0xf7baa37e in futex_wait_cancelable (private=0, expected=0, futex_word=0xefaea048) at ../sysdeps/nptl/futex-internal.h:183
#2  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xefaea008, cond=0xefaea020) at pthread_cond_wait.c:508
#3  __pthread_cond_wait (cond=0xefaea020, mutex=0xefaea008) at pthread_cond_wait.c:638
#4  0xf4c84910 in WTF::ThreadCondition::timedWait () at ../git/Source/WTF/wtf/posix/ThreadingPOSIX.cpp:613
#5  0xf4c493c2 in WTF::ParkingLot::parkConditionallyImpl(void const*, WTF::ScopedLambda<bool ()> const&, WTF::ScopedLambda<void ()> const&, WTF::TimeWithDynamicClockType const&) () at ../git/Source/WTF/wtf/ParkingLot.cpp:595
#6  0xf4c41d7a in WTF::ParkingLot::parkConditionally<WTF::ParkingLot::compareAndPark<unsigned char, unsigned char>(WTF::Atomic<unsigned char> const*, unsigned char)::{lambda()#1}, WTF::ParkingLot::compareAndPark<unsigned char, unsigned char>(WTF::Atomic<unsigned char> const*, unsigned char)::{lambda()#2}>(void const*, WTF::ParkingLot::compareAndPark<unsigned char, unsigned char>(WTF::Atomic<unsigned char> const*, unsigned char)::{lambda()#1} const&, WTF::ParkingLot::compareAndPark<unsigned char, unsigned char>(WTF::Atomic<unsigned char> const*, unsigned char)::{lambda()#2} const&, WTF::TimeWithDynamicClockType const&) () at ../git/Source/WTF/wtf/ParkingLot.h:82
#7  WTF::ParkingLot::compareAndPark<unsigned char, unsigned char> () at ../git/Source/WTF/wtf/ParkingLot.h:94
#8  WTF::LockAlgorithm<unsigned char, (unsigned char)1, (unsigned char)2, WTF::EmptyLockHooks<unsigned char> >::lockSlow () at ../git/Source/WTF/wtf/LockAlgorithmInlines.h:84
#9  0xf4c4b198 in WTF::Lock::lock () at ../git/Source/WTF/wtf/Lock.h:66
#10 WTF::Locker<WTF::Lock>::Locker () at ../git/Source/WTF/wtf/Lock.h:158
#11 WTF::RunLoop::dispatch(WTF::Function<void ()>&&) () at ../git/Source/WTF/wtf/RunLoop.cpp:151
#12 0xf4294002 in operator() () at ../git/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp:428
#13 call () at WTF/Headers/wtf/Function.h:53
#14 0xf40dd37c in WTF::Function<void ()>::operator()() const () at WTF/Headers/wtf/Function.h:82
#15 WTF::CompletionHandler<void ()>::operator()() () at WTF/Headers/wtf/CompletionHandler.h:72
#16 Messages::AuxiliaryProcess::MainThreadPing::cancelReply(WTF::CompletionHandler<void ()>&&) () at DerivedSources/WebKit/AuxiliaryProcessMessageReceiver.cpp:47
#17 0xf428edb8 in operator() () at ../git/Source/WebKit/UIProcess/AuxiliaryProcessProxy.h:240
#18 call () at WTF/Headers/wtf/Function.h:53
#19 0xf428e9c8 in WTF::Function<void (IPC::Decoder*)>::operator()(IPC::Decoder*) const () at WTF/Headers/wtf/Function.h:82
#20 WTF::CompletionHandler<void (IPC::Decoder*)>::operator()(IPC::Decoder*) () at WTF/Headers/wtf/CompletionHandler.h:72
#21 operator() () at ../git/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp:219
#22 call () at WTF/Headers/wtf/Function.h:53
#23 0xf424d6e8 in WTF::Function<void (IPC::Decoder*)>::operator()(IPC::Decoder*) const () at WTF/Headers/wtf/Function.h:82
#24               WTF::CompletionHandler<void (IPC::Decoder*)>::operator()(IPC::Decoder*) () at WTF/Headers/wtf/CompletionHandler.h:72
#25 clearAsyncReplyHandlers () at ../git/Source/WebKit/Platform/IPC/Connection.cpp:1293
#26 0xf424d786 in IPC::Connection::invalidate () at ../git/Source/WebKit/Platform/IPC/Connection.cpp:443
#27 0xf4293ef4 in WebKit::AuxiliaryProcessProxy::shutDownProcess () at ../git/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp:341
#28 0xf42e87d6 in WebKit::WebProcessProxy::shutDown () at ../git/Source/WebKit/UIProcess/WebProcessProxy.cpp:501
#29 0xf42ef1da in WebKit::WebProcessProxy::maybeShutDown () at ../git/Source/WebKit/UIProcess/WebProcessProxy.cpp:1173
#30               WebKit::WebProcessProxy::maybeShutDown () at ../git/Source/WebKit/UIProcess/WebProcessProxy.cpp:1159
#31 0xf42bf316 in WTF::Function<void (WTF::RefCounterEvent)>::operator()(WTF::RefCounterEvent) const () at WTF/Headers/wtf/Function.h:82
#32 WTF::RefCounter<WebKit::WebProcessProxy::ShutdownPreventingScopeType>::Count::deref () at WTF/Headers/wtf/RefCounter.h:105
#33 WTF::DefaultRefDerefTraits<WTF::RefCounter<WebKit::WebProcessProxy::ShutdownPreventingScopeType>::Count>::derefIfNotNull () at WTF/Headers/wtf/RefPtr.h:42
#34 WTF::RefPtr<WTF::RefCounter<WebKit::WebProcessProxy::ShutdownPreventingScopeType>::Count, WTF::RawPtrTraits<WTF::RefCounter<WebKit::WebProcessProxy::ShutdownPreventingScopeType>::Count>, WTF::DefaultRefDerefTraits<WTF::RefCounter<WebKit::WebProcessProxy::ShutdownPreventingScopeType>::Count> >::~RefPtr () at WTF/Headers/wtf/RefPtr.h:74
#35 ~<lambda> () at ../git/Source/WebKit/UIProcess/WebPageProxy.cpp:1240
#36 ~CallableWrapper () at WTF/Headers/wtf/Function.h:47
#37 ~CallableWrapper () at WTF/Headers/wtf/Function.h:47
#38 0xf4253ab6 in std::default_delete<WTF::Detail::CallableWrapperBase<void> >::operator() () at ../lib32-recipe-sysroot/usr/include/c++/9.3.0/bits/unique_ptr.h:81
#39 std::unique_ptr<WTF::Detail::CallableWrapperBase<void>, std::default_delete<WTF::Detail::CallableWrapperBase<void> > >::~unique_ptr () at ../lib32-recipe-sysroot/usr/include/c++/9.3.0/bits/unique_ptr.h:292
#40 WTF::Function<void ()>::~Function() () at ../git/Source/WTF/wtf/Function.h:63
#41 WTF::VectorDestructor<true, WTF::Function<void ()> >::destruct(WTF::Function<void ()>*, WTF::Function<void ()>*) () at ../git/Source/WTF/wtf/Vector.h:69
#42 WTF::VectorTypeOperations<WTF::Function<void ()> >::destruct(WTF::Function<void ()>*, WTF::Function<void ()>*) () at ../git/Source/WTF/wtf/Vector.h:252
#43 WTF::Deque<WTF::Function<void ()>, 0u>::destroyAll() () at ../git/Source/WTF/wtf/Deque.h:359
#44 0xf4c4b0ac in WTF::Deque<WTF::Function<void ()>, 0u>::clear() () at ../git/Source/WTF/wtf/Deque.h:392
#45 WTF::RunLoop::threadWillExit () at ../git/Source/WTF/wtf/RunLoop.cpp:189
#46 0xf4c4b382 in WTF::RunLoop::Holder::~Holder () at ../git/Source/WTF/wtf/RunLoop.cpp:51
#47 WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::Data::~Data () at ../git/Source/WTF/wtf/ThreadSpecific.h:99
#48 WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::destroy () at ../git/Source/WTF/wtf/ThreadSpecific.h:187
#49 0xf7ba5620 in __nptl_deallocate_tsd () at pthread_create.c:301
#50 0xf7ba57ac in start_thread (arg=0xcce9a418) at pthread_create.c:488
#51 0xf79a11ac in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from ./lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
varumugam123 commented 1 month ago

Another reliable way of reproducing this issue is by sending SIGSTOP signal to WPEWebProcess and resume it with SIGCONT after (2 * watchdoghangthresholdtinseconds) seconds

pgorszkowski-igalia commented 1 week ago

@varumugam123 : SIGFPE is sent by https://github.com/WebPlatformForEmbedded/ThunderNanoServicesRDK/blob/rdkservices/WebKitBrowser/WebKitImplementation.cpp#L3461 and it is sent after: _config.WatchDogCheckTimeoutInSeconds.Value() * _config.WatchDogHangThresholdInSeconds.Value() / _config.WatchDogCheckTimeoutInSeconds.Value() = _config.WatchDogHangThresholdInSeconds.Value()

so it is before DeactivateBrowser(PluginHost::IShell::WATCHDOG_EXPIRED); is called: https://github.com/WebPlatformForEmbedded/ThunderNanoServicesRDK/blob/rdkservices/WebKitBrowser/WebKitImplementation.cpp#L3465

pgorszkowski-igalia commented 6 days ago

I am not able to reproduce the problem with deadlock. Each time when I stop WPEWebProcess(SIGSTOP) for more than 2 * watchdoghangthresholdtinseconds and then reactivate it (SIGCONT), the WPEWebProcess receives SIGFPE(crash) and the plugin WebKitBrowser is deactivated. There is no hanging WPEProcess(UI process) - it is destroyed(deactivated) in proper way (without crash).

Here some standard and some my additional logs:

[13:23:21]:[SysLog]:[Fatal]: CRASH: WebProcess crashed: exiting ...
[PGPG] ~WebKitImplementation() START
[PGPG] ~WebKitImplementation() END
[13:23:22]:[SysLog]:[Fatal]: FORCED Shutdown: WebKitBrowser by reason: Failure.
[13:23:22]:[SysLog]:[Crash]: -== /proc/meminfo ==-
...
[13:23:22]:[SysLog]:[Shutdown]: Deactivated plugin [WebKitBrowser]:[WebKitBrowser]

I tested it on RPi (WPE 2.38).

@varumugam123 : is it 100% reproduction rate?

magomez commented 6 days ago

One question: when WPEWebProcess gets hung for some reason, the Browser plugin is supposed to kill the WPEWebProcess, which causes WPE to launch a new web process when a new request is done (or a reload). AFAIK this is how this has been always working. But the browser plugin wasn't mean to be killed in that situation. Is this a change in how it should work? Or is the problem that after WPEWebProcess is killed, and before triggering the creation of a new WPEWebProcess the browser plugin needs to be manually deactivated in order to reproduce this situation?

Is it the case now that the Browser plugin gets killed as well? From the trace, it seems that the Browser plugin gets deadlocked because it's trying to terminate, and in the process it tries to kill a WPEWebProcess that was killed before, so the call to kill it doesn't seem to return.

pgorszkowski-igalia commented 5 days ago

It seems that deadlock is caused by using Locker() two times on the same callstack.

This is the first function on call stack of the Thread 5 where Locker is used:

#45 WTF::RunLoop::threadWillExit () at ../git/Source/WTF/wtf/RunLoop.cpp:189

https://github.com/WebPlatformForEmbedded/WPEWebKit/blob/wpe-2.38/Source/WTF/wtf/RunLoop.cpp#L188

void RunLoop::threadWillExit()
{
    m_currentIteration.clear();
    {
        Locker locker { m_nextIterationLock };
        m_nextIteration.clear();
    }
}

and this is the second one which causes the observed deadlock:

#11 WTF::RunLoop::dispatch(WTF::Function<void ()>&&) () at ../git/Source/WTF/wtf/RunLoop.cpp:151

https://github.com/WebPlatformForEmbedded/WPEWebKit/blob/wpe-2.38/Source/WTF/wtf/RunLoop.cpp#L151

void RunLoop::dispatch(Function<void()>&& function)
{
    RELEASE_ASSERT(function);
    bool needsWakeup = false;

    {
        Locker locker { m_nextIterationLock };
        needsWakeup = m_nextIteration.isEmpty();
        m_nextIteration.append(WTFMove(function));
    }

    if (needsWakeup)
        wakeUp();
}

The problem probably can be reproduced when there is pending sendWithAsyncReply(Messages::AuxiliaryProcess::MainThreadPing() (which starts in AuxiliaryProcessProxy::checkForResponsiveness) and the process of destroying UI begins.

The main thread (Thread 1) is waiting for other threads to stop (one of them is Thread 5). The stopping of the Thread 5 causes that all pending async replies are cancelled - but it means that the callback is called - in our case in callback we call RunLoop::main().dispatch (and here we have a deadlock) :

void AuxiliaryProcessProxy::checkForResponsiveness(CompletionHandler<void()>&& responsivenessHandler, UseLazyStop useLazyStop)
{
    startResponsivenessTimer(useLazyStop);
    sendWithAsyncReply(Messages::AuxiliaryProcess::MainThreadPing(), [weakThis = WeakPtr { *this }, responsivenessHandler = WTFMove(responsivenessHandler)]() mutable {
        // Schedule an asynchronous task because our completion handler may have been called as a result of the AuxiliaryProcessProxy
        // being in the middle of destruction.
        RunLoop::main().dispatch([weakThis = WTFMove(weakThis), responsivenessHandler = WTFMove(responsivenessHandler)]() mutable {
            if (weakThis)
                weakThis->stopResponsivenessTimer();

            if (responsivenessHandler)
                responsivenessHandler();
        });
    });
}

Probably we should add some condition to not call RunLoop::main().dispatch if we are in the middle of destruction.

pgorszkowski-igalia commented 5 days ago

My fix proposal:

diff --git a/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp b/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp
index e660f7ef5f3a..d5fb2ea5ac6d 100644
--- a/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp
+++ b/Source/WebKit/UIProcess/AuxiliaryProcessProxy.cpp
@@ -423,6 +423,9 @@ void AuxiliaryProcessProxy::checkForResponsiveness(CompletionHandler<void()>&& r
 {
     startResponsivenessTimer(useLazyStop);
     sendWithAsyncReply(Messages::AuxiliaryProcess::MainThreadPing(), [weakThis = WeakPtr { *this }, responsivenessHandler = WTFMove(responsivenessHandler)]() mutable {
+       if (!weakThis || !weakThis->connection()->isValid())
+           return;
+
         // Schedule an asynchronous task because our completion handler may have been called as a result of the AuxiliaryProcessProxy
         // being in the middle of destruction.
         RunLoop::main().dispatch([weakThis = WTFMove(weakThis), responsivenessHandler = WTFMove(responsivenessHandler)]() mutable {
pgorszkowski-igalia commented 3 days ago

@varumugam123 , @modeveci: can you test my proposal fix?