dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

[browser][MT] Mono shutdown hang when a thread calls System.Environment.Exit #98190

Open kg opened 8 months ago

kg commented 8 months ago

As of somewhat recently, mono_wasm_exit now calls mono_jit_cleanup. I believe this is correct.

However, mono_jit_cleanup calls mono_thread_manage_internal which (despite its name) appears to be the 'shutdown this thread' API. mono_thread_manage_internal calls mono_runtime_try_shutdown and critically, if mono_runtime_try_shutdown fails, the current thread then blocks waiting for runtime shutdown to finish. The assumption here is that if mono_runtime_try_shutdown failed, that means another thread is currently shutting down the runtime.

The problem is that the current thread could be the one shutting down the runtime. If you call System.Environment.Exit, it also calls mono_runtime_try_shutdown in order to begin shutting down the runtime. This also seems correct. Then it attempts to call exit, which seems to flow through to mono_wasm_exit, triggering shutdown again and hanging. I don't know of any easy way to fix this.

My proposed fix is that the 'are we shutting down the runtime' flag needs to become a 'which thread is shutting down the runtime' variable, so that if a thread attempts to start shutdown a second time, it won't hang. The implementations of S.E.E and mono_wasm_exit both seem right so I think the internals just need to be fixed.

As a result of this S.E.E. is broken for wasm apps on main right now, at least in my local testing. It was previously broken and then fixed in 8.0 by me and zoltan, so we probably should add a WBT test for it this time when we fix it again. It feels likely that it is at risk of future regressions.

ghost commented 8 months ago

Tagging subscribers to 'arch-wasm': @lewing See info in area-owners.md if you want to be subscribed.

Issue Details
As of somewhat recently, `mono_wasm_exit` now calls `mono_jit_cleanup`. I believe this is correct. However, `mono_jit_cleanup` calls `mono_thread_manage_internal` which (despite its name) appears to be the 'shutdown this thread' API. `mono_thread_manage_internal` calls `mono_runtime_try_shutdown` and critically, if `mono_runtime_try_shutdown` fails, the current thread then blocks waiting for runtime shutdown to finish. The assumption here is that if `mono_runtime_try_shutdown` failed, that means another thread is currently shutting down the runtime. The problem is that the current thread could be the one shutting down the runtime. If you call `System.Environment.Exit`, it also calls `mono_runtime_try_shutdown` in order to begin shutting down the runtime. This also seems correct. Then it attempts to call `exit`, which seems to flow through to `mono_wasm_exit`, triggering shutdown again and hanging. I don't know of any easy way to fix this. My proposed fix is that the 'are we shutting down the runtime' flag needs to become a 'which thread is shutting down the runtime' variable, so that if a thread attempts to start shutdown a second time, it won't hang. The implementations of S.E.E and `mono_wasm_exit` both seem right so I think the internals just need to be fixed. As a result of this S.E.E. is broken for wasm apps on main right now, at least in my local testing. It was previously broken and then fixed in 8.0 by me and zoltan, so we probably should add a WBT test for it this time when we fix it again. It feels likely that it is at risk of future regressions.
Author: kg
Assignees: -
Labels: `arch-wasm`, `area-VM-threading-mono`
Milestone: -
lewing commented 8 months ago

cc @steveisok

lewing commented 7 months ago

@pavelsavara @kg Ilets figure out what the right behavior here is and make sure it happens

pavelsavara commented 7 months ago

I believe that mono_thread_manage_internal is waiting for all threads to shutdown. And it has infinite wait time. https://github.com/dotnet/runtime/blob/330b70cfacc7751ab5ac546ed7138cb8a09c3097/src/mono/mono/metadata/threads.c#L2907-L2911

Or do I read it wrong ?

I would prefer if there was "give up" timeout. Similar to MONO_SLEEP_ABORT_LIMIT of the GC's stop-the-world.

kg commented 7 months ago

I believe that mono_thread_manage_internal is waiting for all threads to shutdown. And it has infinite wait time.

https://github.com/dotnet/runtime/blob/330b70cfacc7751ab5ac546ed7138cb8a09c3097/src/mono/mono/metadata/threads.c#L2907-L2911

Or do I read it wrong ?

I would prefer if there was "give up" timeout. Similar to MONO_SLEEP_ABORT_LIMIT of the GC's stop-the-world.

An abort timeout doesn't make a ton of sense to me, we're already tearing down and a timeout means we failed to tear down correctly. I suppose it would mean that this would potentially "work", but I think it would be papering over real issues with shutdown.

pavelsavara commented 7 months ago

I mean abort (fast with non-zero exit code) is better than deadlock/timeout with no exit code at all and no message.

pavelsavara commented 7 months ago

and we should fix the managed Exit from thread too. I didn't mean to discard that.