WICG / storage-foundation-api-explainer

Explainer showcasing a new web storage API, NativeIO
Apache License 2.0
63 stars 8 forks source link

getAll() should use async iterator or have some other means of dealing with a large number of files #9

Closed asutherland closed 3 years ago

asutherland commented 3 years ago

https://wicg.github.io/kv-storage/ ended up specifying use of an async iterator for enumeration because of real-world experience from https://github.com/w3c/ServiceWorker/issues/1066 where it was reported that Chrome would break once too many separate requests/responses were stored. It seems like something similar is called for here.

hugo306 commented 3 years ago

It's very telling that the first thing the Emscripten example does is implement listByPrefix(prefix) on top of getAll(). That's indicative that it should just have been listByPrefix all along.

getAll() is the weak link in this otherwise good API, since churning though megabytes of filenames to access bytes of file won’t be acceptable for a high-performance storage interface. Right now it's not even obviously better than Cache API even for its prime use cases.

Examples:

  1. Checking if a file exists in O(1) space/time
  2. Searching though files in log-linear time
  3. Iterating a date range of prefixed files lazily with early abort, and without allocating for O(n) names.

The iterator can do all of these and much more. You can even trivially build getAll() on top of it (but not the other way around without the performance hit).

const getAll = () => [...listByPrefix('')];

fivedots commented 3 years ago

Hello, thank you for your feedback!

I agree having an async iterator to go over filenames is a goo idea. Right now we are considering a merger with the Origin Private File System (more details here), where an async iterator is already available. I'll close the issue for now, since we intend to have iterators and the surface is defined. I'll reopen it if this doesn't work out for some reason!