dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

[browser][mt] Awaiting a `Task` returned by a `[JSImport]` method sometimes causes a deadlock #106788

Open MackinnonBuck opened 3 weeks ago

MackinnonBuck commented 3 weeks ago

Description

When <WasmEnableThreads> is set to true, awaiting a Task returned by a [JSImport] method may cause a deadlock, even if the associated JS Promise is just a Promise.resolve().

This is likely the cause of the following test failures we've been seeing in dotnet/aspnetcore:

When I repro'd the failures locally, the test app was deadlocking in an interop call made by the framework here.

However, a much simpler scenario can reproduce the same issue. Here's some of the code from my minimal repro project:

using System.Runtime.InteropServices.JavaScript;
using System.Threading.Tasks;

JS.Log("Started...");
await JS.GetEmptyPromiseAsync();
JS.Log("Complete, reloading...");
JS.Reload();

partial class JS
{
    [JSImport("log", "main.js")]
    internal static partial void Log(string message);

    [JSImport("getEmptyPromise", "main.js")]
    internal static partial Task GetEmptyPromiseAsync();

    [JSImport("globalThis.location.reload")]
    internal static partial void Reload();
}

This repeatedly reloads the page until the deadlock occurs. On my machine, the deadlock happens within ~10 refreshes.

Reproduction Steps

  1. Download the latest .NET 9 Preview 7 build from https://github.com/dotnet/sdk/blob/main/documentation/package-table.md
  2. Clone https://github.com/MackinnonBuck/dotnet-wasm-repro
  3. From the root of the repro project, run the following:
    • dotnet workload install wasm-tools --include-previews
    • dotnet workload install wasm-experimental --include-previews
    • dotnet tool install dotnet-serve
    • dotnet publish
  4. From the wwwroot folder in the published output, run:
    • dotnet serve -h "Cross-Origin-Embedder-Policy:require-corp" -h "Cross-Origin-Opener-Policy:same-origin"
  5. Open the outputted URL in any web browser
  6. Observe the deadlock:
    • The page will repeatedly refresh until the deadlock occurs
    • When the deadlock happens, the page will look something like this: image

Expected behavior

A deadlock does not happen, and the webpage continues to refresh indefinitely.

Actual behavior

The page freezes within ~10 page reloads.

Regression?

Unsure - I unfortunately wasn't able to get the repro to run on a .NET 8 TFM and SDK.

Known Workarounds

No response

Configuration

.NET SDK:

OS

Windows 11 Enterprise 23H2

Architecture

x64

Do you know whether it is specific to that configuration?

No

Which web browser(s) do you see this issue in?

I was able to repro this in:

Other information

No response

dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to 'arch-wasm': @lewing See info in area-owners.md if you want to be subscribed.

ilonatommy commented 3 weeks ago

Most probably connected with https://github.com/dotnet/runtime/issues/104772. I am gonna check if the fix is backporting https://github.com/dotnet/runtime/pull/105464

MackinnonBuck commented 3 weeks ago

@ilonatommy, I realized my original comment mentioned ".NET 8 Preview 7", but I actually meant ".NET 9 Preview 7". Sorry if that created any confusion!

I just tried the repro again from the latest main installer in this table, which should contain the fix from #105464, and the problem still reproduces for me.

ilonatommy commented 2 weeks ago

@pavelsavara, the problem is real and can be reproduced by using the threading sample on main. The minimal requirement is async JsImport (triggering threads without interop does not deadlock, e.g. exchanging await JS.GetEmptyPromiseAsync(); for await Task.Run(async ()=>{ JS.Log($"ID: {Environment.CurrentManagedThreadId}"); }); works fine), and reloading the page. Calling async JsImport function without re-initializing the threads (no page reload) does not deadlock either.

ilonatommy commented 2 weeks ago

image

this one is updated version of the truncated one, I caught the same place again. image

image

pavelsavara commented 2 weeks ago

my theory: all 3 screenshots are showing

@kg

pavelsavara commented 2 weeks ago

It could be also problem in emscripten code.

Allocation happens with inside do_proxy() -> get_or_add_tasks_for_thread() inside of pthread_mutex_lock(&q->mutex);

But it seems to me that in our case, the target queue is queue of IO thread, not the UI thread.

ilonatommy commented 2 weeks ago

More stacks (same stack result was produced in several different runs, the only difference - non-truncated last thread dump. I stop collecting): main [0] image

deputy [1] image

IO [2] image

norm [3] image

pool [4] won't break

gate [5] image

emscripten pool [6] won't break, the only one that is not running (last log of thread dump when the app was running) image

After the app froze, dump of threads: image