Closed jlongster closed 2 years ago
The code example is correct, that is, you need to await the createSyncAccessHandle() method in the implementation that's currently landed in Chrome Canary. The "SyncAccessHandle" is not fully synchronous, only read/write are synchronous in order get better performance in some Wasm use cases. Other operations, such as close(), truncate() and getSize() on a SyncAccessHandle are asynchronous, see the IDL definition for details. This is different in StorageFoundation, where all methods of the sync API are synchronous.
Oh, the naming then is misleading, especially for folks coming from Node.js, where, e.g., file system operations exist as foo()
and fooSync()
. Will close my PR then.
That's really too bad. Why is it async? If we are in a worker we have no need for it to be async.
It makes it very hard to use the API from WASM projects compiled with C, where everything is sync. It will require the whole mess of spawning a worker and using Atomics.wait
to convert everything async to sync, and even reads/writes with have to go through that layer because the work controls the files, introducing overhead everywhere.
Pinging @fivedots for more context on this decision.
Thank you for your feedback on this and other issues!
There’s a few reasons the SyncAccessHandle has a mix of sync and async methods. Overall, sync methods that deal with the host’s file system are discouraged (although, of course, possible in a worker context), and this was mentioned a few times by people that have more experience with spec’ing new APIs (it is also mentioned in the Web Platform Design Principles guide). There’s also tooling that helps with dealing with async calls from Wasm, like Asyncify and our PThreadFS prototype, which should simplify porting to Wasm on top of AccessHandles. The last reason is that there are active efforts on the Wasm side to natively support async calls with little overhead, hopefully meaning that async calls will not be problematic in the medium term.
Considering that, we decided on direct symmetry between the async and sync surface except on the critical read/write methods, where the overhead of something like Asyncify would be too much. The hope is that this leads to a balance between supporting new use cases now and future-proofing the API.
Thanks for explaining!
I still hope you reconsider for the following reasons:
await
, and I believe a lot of people are going to get tripped up on that at first.Atomics.wait
to block on it, turning an async call into a sync one. That is still introducing quite a bit of overhead!await stmt.run()
. SQLite being sync is a major feature and it's really unfortunate to lose that, only because of internal details being leaked.When we are in a worker, please give us fully sync APIs! It simplifies things SO much.
Here's an example of how much is simplifies things. I wrote a webkitFileSystem
backend for absurd-sql, which provides fully sync APIs. Here's all the code (this backend is not usable unfortunately because it has no locking): https://gist.github.com/jlongster/c92ac81c402de59d8556b69a4168d4e1
Compare that with the IndexedDB backend that needs to do all the work of using Atomics.wait
to turn an async API into a sync one: https://github.com/jlongster/absurd-sql/tree/master/src/indexeddb
The PThreadFS work might hide away that complexity, but that overhead is still there. Even though you made read/write sync, because we opened the files on a separate thread all the reads/writes must now be proxied over to that thread and back. Nothing is going to be as fast as purely sync APIs, and unless there's a strong technical reason to not provide them, I suggest making all of them sync.
@fivedots : I'm a contributor to Sql.js and am following @jlongster's work with intense interest. I just gave him a 👍 for his comments above, but that doesn't capture how strongly I agree with his statements. :)
I am of course very interested in the future of both WebAssembly and Sql.js in particular, and after reading your proposal, I was thrilled to see that this is a primary use case for your work! As @jlongster mentioned, its synchronous interface and performant nature are both very appreciated features of the library. I see that you've obviously considered the performance implications. To quote from the proposal:
The reason for offering a Worker-only synchronous interface, is that consuming asynchronous APIs from Wasm has severe performance implications (more details here). Since this overhead is most impactful on methods that are called often, we've only made read() and write() synchronous. This allows us to keep a simpler mental model (where the sync and async handle are identical, except reading and writing) and reduce the number of new sync methods, while avoiding the most important perfomance penalties.
I think you bring up good points about performance in that reading and writing calls will benefit from the performance more than the "less-often-called" handle creation. However, the fact that a call to create a handle cannot be synchronous means that the primary WebAssembly use case for this proposal will now have significant complexity added along with the significant consequences to the entire user-level API that @jlongster mentions. I also don't know (@jlongster can likely correct me) if the assumption about the handle creation being such a rare occurrence is a good one when it comes to simulating an underlying filesystem for SQLite. At the very least, making such a fundamental operation async guarantees that almost all users of these file APIs will likely need to be asynchronous themselves.
I also read the reasoning in the doc you linked: https://docs.google.com/document/d/1lsQhTsfcVIeOW80dr467Auud_VCeAUv2ZOkC63oSyKo/edit#heading=h.tb2xzhzdt9qr
This section in particular:
Our goal is to deprecate the synchronous OPFS interface at this point. In order to be able to do so, we will
- Explicitly communicate to web developers that the synchronous interface is to be avoided outside of very specific WebAssembly use cases.
- Commit to providing a polyfill for the sync APIs using the async APIs and the Wasm support.
- Communicate a concrete deprecation plan as soon as possible Alternatives considered Offering an asynchronous interface only The main argument against a synchronous interface is that synchronous APIs will be made obsolete through improvements in WebAssembly and native code. Our counterpoint is that these improvements will be done over a multi-year time span of uncertain length. A synchronous interface could unlock important use cases much faster and be deprecated later on.
This feels like the sort of "very specific WebAssembly" use case they had in mind, is it not? At the very least, I think a strong case could be made.
I know that these APIs need to be designed to be safe/intuitive for the consumer around for the long haul, and I am so glad that all of the authors involved are designing around the WebAssembly and Sql.js use case in particular! But as the authors of the doc mentioned, the time-frames involved that will make these asynchronous APIs a "non-issue" are multi-year and unknown. I am concerned that this decision will have significant negative consequences in one of the primary use cases of this proposal.
I also would like to see the sync apis explicitly supported without threat of future deprecation, as well as sync apis for every call (including handle creation) for the new handles. Making people use Asyncify for a few async operations forces the application to depend on a pretty complex and messy work-around for no apparently good reason.
Let the WASM folks have the sync methods in the web workers.
@fivedots I would like to point out, because it is often not talked about, that the WASM filesystem relies on a fully memory-backed synchronous file system on the main UI thread (at least with pthreads). Data persistence is done via a call to syncfs where async stuff can happen.
We are looking at ways to pitch in and help here... but I have to say, the churn involved with the SF apis being merged and losing access to sync apis gives me great pause to say the least...
Thank you for the detailed replies.
For clarification, are we talking about a fully synchronous surface for access handles or generally for the Origin Private File System (OPFS)? I.e. fully sync access handles would give you methods like sync createSyncAccessHandle(), sync truncate(), sync close(), etc., while a fully sync OPFS would include the previous list plus methods like sync getFileHandle(), sync getDirectoryHandle(), sync removeEntry(), etc.
My understanding based on your concerns, is that you’d prefer a fully sync OPFS. Just providing a fully sync AccessHandle would not be that useful, since you’d still need to pay the overhead of something like Asyncify or PThreadFS for filesystem-like operations e.g. opening/creating a file, moving it, deleting it.
Having more clarity on this would help us evaluate who the right stakeholders are and what the right approach is!
I suppose it would require a fully sync OPFS surface area. Our short-term goal is to be able to use the OPFS apis in WASM worker threads directly until the FS abstractions are reliable. One day they will be and it will be awesome, but for now there's too much churn and uncertainty.
@fivedots any more info on this? Am I going to be able to count on a sync surface area for OPFS?
A fully sync surface is a larger project, so I’ll add @mkruisselbrink to get his point of view and make him aware of the feature request.
In my opinion, If we had a fully sync OPFS, it would make sense that it would also include a fully sync Access Handle. That being said, I’d consider that a separate effort from our current focus on introducing Access Handles. We’ve gotten positive feedback during our Origin Trial on the current interface, and it would make sense to work on shipping that while also discussing a future fully sync surface. That way we can quickly cover an initial set of use cases and continue iterating to provide better support to you.
@fivedots I understand... I might ask, though, that you continue the SF origin trial longer to give the people who have been using the SF sync interface from WASM a path for migration. Right now I'm going to be completely broken when the origin trial ends.
@fivedots @mkruisselbrink any news? do I have a bridge until the OPFS stuff has a full sync surface area?
Sadly, given the infrastructure of Origin Trials, that won’t be possible. Origin Trials are used to experiment and gather feedback on possible new features. Each time they are extended, an explicit experimental goal and approvals from Chrome API Owners are required. It is not possible to indefinitely leave the trial running until a particular feature (i.e. sync OPFS in this case) lands.
@fivedots well then I will need a sync OPFS layer quickly after Jan 4, as after that, the next chrome build will remove the feature we've been working with. Can we help make that happen? where can we pitch in? A sync OPFS surface area is really needed by us and clearly by others in this thread as well.
I understand there's a real use case for a sync surface, but we can't commit to implementing the feature request by the time the Storage Foundation Origin Trial expires. My recommendation would be to look into other ways of addressing your use case. One option is to look into PthreadFS as an intermediate step. Feedback or direct contributions there would be very welcome. Another one would be to rely on Asyncify to wrap the async methods.
We have looked at Asyncify and it blows up our build. We have also looked at PthreadFS but we can't have the FS in memory, it's too big.
Another bump here... we are seeing that the async nature of OPFS and the entry navigation are a significant impact https://github.com/emscripten-core/emscripten/issues/15041#issuecomment-970447006
I would greatly appreciate any effort that could be made to make an all sync surface area for OPFS. WASM could be significantly faster...
Closing this in favor of https://github.com/whatwg/fs/issues/7, as that will be the new home of access handles.
I was reading the
AccessHandle
proposal, and looked at the code example here: https://github.com/WICG/file-system-access/blob/main/AccessHandle.md#proposed-apiThe sync version available in a worker context looks like this:
You don't need to
await
that right? The sync APIs should never need to be awaited.