Open aran opened 2 weeks ago
Hmm, I quickly searched it and see https://github.com/WebAssembly/threads/issues/106.
Quick guess (need experiments to verify!): Is it because that, rust_arc_increment_strong_count
(calls Arc
to increment count) once in a while needs to do Atomics.wait
(because of concurrency?), and if that is done on main thread, webassembly is unhappy about that.
Maybe we can firstly reproduce this by creating a minimal Rust code that calls Atomics.wait
deliberately, and tries to call it on Dart main thread and see what is going on.
I confirmed that if I convert my data structure that is involved in this from RustAutoOpaque to translatable, the issue no longer occurs (or maybe occurs much more rarely). When it is RustAutoOpaque, possibly because of nondeterminism in my app's behavior, it shows up every few refreshes.
From reading that thread and a few others around it, it's not safe for any FRB code to call any Atomics.wait-using code, unless there's a way to mark some FRB code as "only off main thread" and enforce that. Other ways would be to explicitly drop multithreading and +atomics, or take out '&var' support and insist on only pure owned data transfers, &mut, or pure copies, but no locking until the Rust atomics story is cleared up.
it's not safe for any FRB code to call any Atomics.wait-using code, unless there's a way to mark some FRB code as "only off main thread" and enforce that
Yes I think so, due to limitations of web assembly. That limitations holds only for Web, and non-Web platforms are always happy though.
unless there's a way to mark some FRB code as "only off main thread" and enforce that
Firstly, for user code, as long as you do not use sync mode, i.e. use the default mode, I guess you are safe. This is because the default mode uses a thread pool, thus no main thread is blocked.
However, for this specific bug report, it is because of rust_arc_increment_strong_count
(which calls Arc::increment_strong_count
, which seems to be called in main thread. Then maybe we have some choices to fix it:
fn rust_arc_increment_strong_count() { execute_in_thread_pool(|| Arc::increment_strong_count()) }
.Arc
that does not use Atomics.wait in Web? I think there can be one, since e.g. it can choose to spin instead of calling Atomics.wait. I guess we can ask this on Rust forum etc.Feel free to PR, alternatively I will try to squeeze out time working on it!
A couple other notes, without solving anything:
Atomics.waitAsync
, supported everywhere except Firefox, and my understanding is it can be supported in Firefox at lower performance with a polyfill.sync
has different correctness criteria due to this issue.Dismissible
widget requires that a data source reflects a dismiss immediately, so if the data source lives in Rust, I need modify the data with a #[frb(sync)] method. I'll try to find a workaround.After some more thought, it's clearer now: When targeting wasm there is no safe way to use standard concurrency primitives that ultimately depend on Atomics.wait
.
In particular any frb(sync) function cannot wait on any lock/mutex because all lock/mutex code will ultimately need either Atomics.wait
or an async call out. Any async function has to cope with being async but not having access to concurrency control primitives. The current behavior of automatically wrapping in an RwLock doesn't work for several cases.
Assuming we can eliminate the automatic RwLock wrapper, you can safely use frb(sync) with:
It can never be made safe to mix frb(sync) with async methods accessing the same data from the Rust thread pool.
Now for async with &mut
or owned mut
- the story there is even harder. To write safe async code that uses &mut
references to data, you have to guarantee at the Flutter layer that it's a globally unique reference, no matter which thread. That's very hard to guarantee at the Flutter layer without Rust's type system to help out. There might some hacks to help like a runtime ownership tracking API at the Flutter level to catch issues, or a main-thread-only busywait implementation, because it's hard to stop flutter code from calling asyncMutableRustAPI(data); asyncMutableRustAPI(data);.
Finally, auto-wrapping in an RwLock only works for data where all accesses are async and sent to the web worker thread pool.
I could see a few different ways of supporting this from the FRB layer. One way would be allow users finer-grained control of the RustAutoOpaque wrapping to allow avoiding the automatic RwLock. Another helpful feature would be allow running an async call on the main thread event loop rather than the thread pool. Then for some data that had a frb(sync)
call with an &mut
on some data, you could allow other async methods that take &
references to it to run on the main thread event loop, which would still be safe.
With no further changes to FRB, the simplest way to make Rust mutable data safe might be to choose either async or sync globally. Either run everything in the thread pool where RwLock works, or run everything sync where Atomic.wait will never be needed.
Another helpful feature would be allow running an async call on the main thread event loop rather than the thread pool.
Do not remember very clearly, but in web it seems we use the main thread and web primitives for async
, so it is not in a thread pool. It is the non-sync non-async fn
that uses thread pool IMHO.
@fzyzcjy When in web, make the real increment-strong-count happen in non-main thread. For example, fn rust_arc_increment_strong_count() { execute_in_thread_pool(|| Arc::increment_strong_count()) }.
Elaborate this a little bit: We can convert arbitrary main-thread code into non-main-thread, by letting main thread runs function body in non-main-thread, and main thread just do busy wait (if other waiting primitive is not allowed). This may not be harmful (except for overhead of changing thread), because even without this change, the main thread will be blocked by exactly the same amount of time.
Apologies for the lack of clarity - in all of the above, when talking about async functions, I only meant non-async
Rust functions that are not labeled frb(sync)
and are compiled to async
Dart functions. For these functions, it would be useful to have options to instruct FRB to avoid wrapping data in locks and to send these to the main thread event loop rather than thread pool.
I'm not sure of the right tradeoffs between main thread busy-waits and other approaches. At first glance it would seem unfortunate to be forced into main thread busy-waits because it could result in surprising performance behavior.
More information why increment_strong_count utilizes Atomics.wait: It seems that you are using the quickstart mode. Try to set full_dep: true
and it will use the real std arc which should not use Atomics IMHO.
Btw, as is mentioned in your another post, another simple way to avoid this issue may be to use async fn
(i.e. Rust async functions) and #[frb(sync)] fn
(i.e. Dart and Rust sync functions), which in Web only uses the main thread (while uses multi thread in native.
Up to you how you want to track full solutions. For my case I am working around issues by exclusively using frb(sync) and I've turned on full_dep
although I'm not sure I understand the full implications of that or why there are different modes with different correctness properties.
Describe the bug
I have a rust crate "other" and a type within it "LocalEditor". In my flutter_rust_bridge target crate, I have
use other::LocalEditor
. In the flutter_rust_bridge crate, I have a methodpub fn uses_local_editor(editor: &mut<LocalEditor>) -> Option<Vec<u8>>{...}
In dart I have an async function 'doesStuff' and I call doesStuff() with no await so it runs independently in the main event loop. It parses some data in Rust and eventually calls notifyListeners() if applicable.
doesStuff has:
final value = await usesLocalEditor(editor: anEditor);
I sometimes get, but it's not reliablie or frequent, and I can't repro:
This is on up-to-date Chrome with up-to-date FRB. No reliable repro, it doesn't happen every time, but I wanted to report in case the stack trace / error message are something you're familiar with.
Steps to reproduce
N/A
Logs
Expected behavior
No response
Generated binding code
No response
OS
No response
Version of
flutter_rust_bridge_codegen
No response
Flutter info
No response
Version of
clang++
No response
Additional context
No response