Performance concerns - Githubissues

WICG / kv-storage

[On hold] A proposal for an async key/value storage API for the web

Other

550 stars 18 forks source link

Performance concerns #57

Open annevk opened 5 years ago

annevk commented 5 years ago

From Elliott Sprehn at https://twitter.com/ElliottZ/status/1105342069449474048.

domenic commented 5 years ago

I'd love to get storage implementers' thoughts on this, /cc @asutherland and @pwnall. I was always under the impression that we were encouraging folks to move away from synchronous storage APIs and toward async ones.

One particular concern expressed is about the scheduling overhead involved in enumerating all keys. I'm not sure how often that's done, and would love to hear more about why it's important to avoid the round trip time, or indeed if that round trip time is a problem in practice (e.g. with benchmarks). We made that design decision on urging from both Mozilla and Google storage folks in #6; perhaps we should revisit it?

Another route we can take is to build a even higher-level abstraction on top of the disk-backed KV storage, which is a memory-backed wrapper that only periodically syncs to disk (or maybe only syncs on writes). That seems like the right way to expose such an abstraction, instead of continuing to encourage the localStorage API. It's unclear to me whether that's needed yet though, given that we haven't seen a lot of those in user-space, whereas we've seen a lot of disk-backed KV storage-style libraries.

asutherland commented 5 years ago

re: LocalStorage is fast

localStorage is the most heavily optimized storage subsystem in Firefox out of necessity because synchronously blocking the main-thread of a process on I/O is so tremendously bad. We're able to get away with the optimizations because storage usage is hard-capped at 5MB (length(keys) + length(values)) and most sites don't use up their full allocation. We also are able to make a myriad of performance optimizations because the spec provides zero durability guarantees and zero coherency and consistency guarantees in the face of multi-process browsers.

In terms of specific performance advantages under the current implementation:

LocalStorage in Firefox is given a massive preload hint, triggering a full load of all data stored for the origin.
LocalStorage writes in Firefox are buffered for up to 5 seconds before they are written to disk and any apparent write delay is due to IPC serialization overhead and same-process "event" dispatching.

In Firefox, there is extensive opportunity for optimization of our IndexedDB implementation, especially as it relates to prefetching and preloading of data for cursors and iteration. We are beginning to allocate engineers to tasks in this area. I expect the newly introduced commit() logic in IndexedDB to help too.

re: memory-backed KV Storage

Yeah, this seems like a useful library to have, as it ends up covering the heart of localStorage implementations' optimizations and likely covers most consumer needs. (For now, I think a Firefox-supporting implementation would want a BroadcastChannel mechanism to help with consistency until Firefox gets IDB observers?)

Things Standards Can Do

From a standards perspective, I think the most useful effort to support optimization would be standardizing multiple storage buckets. This would potentially enable:

explicit preloading of buckets. Right now if the browser wants to preload IndexedDB databases / Cache API, it basically needs to act like an old-school disk defragmenter that logs accesses per client URL and I don't know that that is something we want to or can standardize. But if ServiceWorker registrations indicate what named buckets they own and the list of buckets to preload in priority order, that we can do a better job of standardizing.
relaxed durability guarantees in favor of improved performance for buckets. An offline-supporting mail client web-app may want super high levels of durability for messages the user composed offline and have not yet been sent, but be less concerned about cached offline messages that are canonically stored on the server. Right now, if Firefox initializes and finds some of the data in an origin is corrupted, by spec we need to wipe the whole origin. While IndexedDB could define its own optional durability configuration, there's still the issue of what to do when the relaxed settings result in corruption that should technically wipe the origin.
improved data clearing to improve overall performance. A big issue for LocalStorage is that if a smart page on your origin only stores 4k of data in LocalStorage and accesses it early in your page load, and a foolish page on your origin stores 4M of data for logs (we've seen this), your smart page still has to wait for 4M of disk reads to complete. (At least, under Firefox's legacy implementation.) Explicit buckets would help limit the amount of data there is to preload, as there is obviously an opportunity cost to speculatively-performed I/O. (This is both from pages scoping themselves, as well as browsers being able to more proactively evict unused buckets since least-recently-used algorithms are no longer operating at an origin granularity.)

pwnall commented 5 years ago

Memory Cache

I agree with @domenic that an in-memory cache is complex enough that it deserves its own API layer. localStorage doesn't give any cross-tab synchronization guarantees. The price paid for fast reads is the possibility of seeing old data. A memory cache based on IndexedDB observers would have similar issues.

I see kv-store as a great entry-level API that works well for most cases. Given this framing, I expect that developers will assume full consistency. If kv-store had built-in memory caching, some developers would never realize that kv-store provides eventual consistency.

An in-memory cache introduces the following API-level complexity:

Cache size - we could ask the developer to pass a size to a method that switches kv-store into a caching mode, and implement an LRU cache with that many entries.
The option of avoiding cache pollution -- table scans should skip the cache, as they'd be most likely polluting it. We can treat all iteration methods like table scans, or we can have them take in optional arguments.

From a very different angle, we could provide an always-consistent in-memory cache, using a cache coherence protocol. I think we could build a message-based protocol in JS, on top of the Web Locks API plus a few straightforward extensions. Alternatively, we could look into standardizing a coherent cache primitive that admits a faster implementation in C++ using shared memory.

Performance Problem

I'd like some clarification on the performance issue we're discussing. "Performance concerns" isn't a very actionable issue summary.

The tweet referenced in the issue recommends localStorage because reads are fast, but then mentions TTI. In Chrome, DOMStorage blocks the main thread until we've loaded the database from disk. Also, DOMStorage uses the same backend as IndexedDB (LevelDB). So, while localStorage can get much better read throughput (at the cost of inconsistency), the first read should not be much faster than in IndexedDB.

I agree with @asutherland that the answer to better TTI is storage buckets and preload hints. I expect that an in-memory cache will only improve read throughput, not TTI.

esprehn commented 5 years ago

The issue here is that IndexedDB needs to open the database first, which then posts a task, and then each read is also posting a task. If you imagine an app that contains a dozen React components that each have some cached data required to get to TTI then your app is now:

1. [Page load]
2. [IPC to open()]
... Huge delay while other stuff runs like async script tags, setTimeout, XHRs responses, etc.
6. [getItem() #1 promise resolves]
... more delay from event loop stuff like onload events, etc.
8. [getItem() #2 promise resolves]
... eventually TTI

Had you used LocalStorage your app looks like

1. [Page load]
2. Block and load the db, synchronous React render.
3. Paint
... now the flood of setTimeout, XHRs, onload, etc.

In our app using localStorage would make TTI noticeably faster (hundreds of ms @ P95) than IndexedDB because it avoids the large delays in the event loop. An easy way to simulate this is to construct a page that is scheduling hundreds of setTimeout, Image#onload, and async scripts where each task takes ~50ms.

This is what our app like on my $3000 Macbook Pro: IndexedDB TaskDelay

This isn't about storage performance where preloads etc. matter, this is about the semantics of the API itself and how task oriented APIs work. I've spent time optimizing this issue in both React Native apps around AsyncStorage and now around web apps with IndexedDB. As a product engineer I think eventually consistent and fast reads is a better default.

As for iterating the keys, yes we did it for a number of use cases like storing outgoing analytics events in our metrics system. setItem/transactions() of an array requires reading in the data to append to it, instead we'd append to the store itself with setItem and prefixed keys and then read them all.

LocalForage also has an API to get all the keys: https://localforage.github.io/localForage/#data-api-keys Swapping in the KVStorage API would make any code using that significantly slower since now what used to be reading all the keys at once is posting tasks into the queue mixed with all the other tasks an app is running. ex. Developers will think they can swap: for (const key of storage.keys()) with for await (const key of storage) but that has much worse perf.

"I see kv-store as a great entry-level API" I think this is what I disagree with. It's an API that's slow by default. A great entry level API should be fast by default so that web apps are fast and competitive. LocalStorage is a pretty decent API for most folks use cases but we've scared them away from using it with talk of how it's synchronous, which is what my tweet is about. I did it while working on the web platform, but in reality I've realized while working as a product engineer and while optimizing a number of apps that the semantics of localStorage are pretty decent and the fear we spread was not accurate. If every getItem in localStorage hit disk that'd be bad, and sure enough document.cookie is a perf issue in every app I've worked on while localStorage reads during apps startup have not been a perf issue. (Though to be fair document.cookie time in our app leading up to TTI is 14ms which is still faster than the task delay from an API like IndexedDB).

I think adding setAsync() to localStorage with a durability guarantee and bringing localStorage to Workers with eventual consistency just like multiple tabs would have a really positive impact on the web, and keeps the fast read semantics that are good for most uses and leaves IndexedDB available for low level situations where you need more control. I'd personally prefer that over a whole new Storage API. :)

pwnall commented 5 years ago

@esprehn Thank you very much for the detailed explanation here!

Risking over-simplifying, I claim that you're advocating for a local optimum. I read your comment to say you'd prefer a synchronous API that blocks the main thread to an async API, because the latter allows too many low-priority tasks to get queued up before some high-priority task. In other words, you'd prefer that we block the renderer thread as opposed to giving your app control over it, because you'll use this control incorrectly.

In other words, the principle behind async APIs is that we give your app the opportunity to do other work while a long task is carried out. At a superficial glance, the fact that you're worse off when given this opportunity suggests a problem in the app's architecture. Taking a deeper look, I think the truth is that the Web platform makes it really easy to grow problematic architectures, and really difficult to develop high-performing apps. I think @domenic can share some initiatives around scheduling that we think will tilt the scales back in the developers' favor.

Apologies in advance if I misunderstood or if I'm mis-representing what you're saying.

esprehn commented 5 years ago

I don't think this is an issue in our app's architecture (and I'd like to think I understand both Chrome's architecture and the web platform reasonably well :)). Storage blocking the main thread one time and then reading from an in memory map is not a perf issue. It's not worse than when you use ICU the first time as it pages in a bunch of stuff which also blocks the main thread in apps, but we still shipped synchronous DateFormatter APIs. :) There's lots of examples of this stuff, like compiling shaders the first time you open a tab also blocks the compositor.

Reads are high volume, but writes are low volume for typical apps. So having reads be from an in memory map that's fast and writes go through setItemAsync() is the optimal semantics for the "simple storage API" to get the best TTI.

Also a new scheduling API is nice (and I'm collaborating with the Chrome folks on that as well), but that doesn't meet developers where they are, nor does the current scheduling API proposal fix the issue I'm describing here around browser scheduled work. Our app also contains a huge number of third party libraries which won't be updated to use a new scheduling API for years plus lots of the work that delays the event queue comes from the browser itself (ex. firing onload on script tags, pumping frames, doing DOM GC, etc.). A new storage API should be designed with semantics and API surface that results in good performance.

For an example of what a real app has, first simulate a busy but still RAIL acceptable main thread:

// This makes a lot of 50ms tasks on my MBP.
setInterval(() => { let a = []; Array.from("a".repeat(1000000)).forEach(c => a.push(c)) }, 5)

Then do an IndexedDB read (assuming you created the object store and handled upgrade etc.):

t = Date.now();
r = indexedDB.open("test");
r.onsuccess = (e) => {
  let r1 = r.result.transaction("test").objectStore("test").get("value");
  r1.onsuccess = () => { console.log(Date.now() - t); }
}

That takes 200ms, localStorage.getItem is ~1ms even though it had to go through the browser process.

asutherland commented 5 years ago

I understand this issue to now be a proposal to:

expose LocalStorage to Workers.
add a setAsync() method which also adds durability guarantees to LocalStorage where there currently are none.

I understand this proposal is made in the context of performance concerns about IndexedDB and kv-storage which is built on top of IndexedDB. I think these are very valid and legitimate concerns and agree that, at least in Firefox, after any initial hard content process jank during bulk load of LocalStorage, it's hard to hit a slow-path in LocalStorage but easy to hit one in IndexedDB.

I believe Firefox is currently a hard no to both proposals as I understand them. As we've landed our new LocalStorage NextGen implementation, we've been reminded of how there are more than just jank issues from synchronous APIs. We also have to deal with a variety of edge-cases that potentially risk deadlock thanks to existing in a multi-process world where accessibility APIs as used by screen-readers and IME's (at least on Windows) require synchronous parent-to-child IPC that can collide with synchronous child-to-parent IPC as necessitated by LocalStorage. There are also a number of other complicating factors from existing APIs (sync XHR, sync importScripts, worker.terminate()), plus implementation-specific complications (debugger!).

Our current plan in Firefox will be to optimize IndexedDB and the specific use-cases of kv-storage. Our timeline is such that we should be able to take advantage of Firefox's spectre-motivated "project fission" which is roughly analogous to chromium's process-per-origin approach. This should enable us to maintain an authoritative in-memory state of kv-storage values (or any small IDB databases' in-line contents) in the process where they'll be used. In this case, we could synchronously resolve kv-storage calls so they're processed in the current task's micro-task queue, although I expect we may need to do some spec-work to make that okay. Hopefully that can be accomplished in IndexedDB as part of promise-ifying it.

domenic commented 5 years ago

Thanks all for the productive discussion here. My takeaways here are:

It would be good to dig into airbnb's performance issues with IndexedDB a bit more, even if @esprehn believes that an async API is not the optimal architecture. I believe @pwnall has reached out for traces. Such traces would be useful for benefiting the whole web platform and all users of IndexedDB, even if Airbnb prefers to stick to sync localStorage until/unless we can make IndexedDB work for their use cases.
The scheduling API has more work to do to meet developers where they are and enable async tasks (such as IndexedDB) to be high priority and thus usable during app startup in the way Airbnb envisions. For example, ways to forcibly schedule third-party libraries without requiring waiting years for them to update. Since @esprehn says he already has ongoing collaboration with Chrome folks working on that proposal, I will trust that process to continue.
We should keep an open feature request on this repo for a memory-backed, periodically-synced-to-disk API. If a customer with a real app (perhaps Airbnb??) would be interested in using that, it sounds like both Firefox and Chrome are interested in exploring ways to make that work. (Whether it be a new API, or something like @asutherland's proposal of tweaking the spec to decrease event loop round trips and making it work transparently with the KV storage API.)

Did I miss anything?

jimmywarting commented 5 years ago

While a nice modern promise based - none blocking - object storage would be nice
...this is the reality

Add that to Evercookies bucket list

I agree a with @asutherland. Improve what already exist. add async promise methods to existing storages (kinda like Node has both a sync and async methods for most things). Make local/session storage able to store objects too and expose it to web workers. Existing storages has StorageEvent so you can observe changes which this spec dosen't.

both a async and sync method are useful (eg on unload event)

esprehn commented 5 years ago

@jimmywarting I think @asutherland is saying no to that proposal and is instead saying we should have this entirely new storage API that's worse for performance and missing events out the gate.

@domenic Instead of providing more traces I'd encourage implementors to run the code snippet I shared above. It shows that kv-storage is 200x slower than localStorage even when hitting disk in an app that's otherwise RAIL compliant.

Who's the customer asking for kv-storage btw?

At the very least please consider adding Promise<Array> getAllKeys() to match localForage and RN's AsyncStorage and not the async iterator.

domenic commented 5 years ago

@esprehn as I'm sure you're aware, while microbenchmarks are valuable, real-world perf data (i.e. traces) is even more so. It's easy to create microbenchmarks that make one API or another look bad in finely-tuned circumstances, but working with customers on their real sites (as we're doing with kv-storage on Docs, ads, and many smaller sites looking to remove their IDB wrappers) is more likely to lead to long-term healthy API design.

As I said, we could definitely consider API or spec tweaks if a customer with a real app would be willing to work with us on identifying the bottlenecks with the current solution under such real-world circumstances. If this is something Airbnb would be interested in deploying and testing various implementations and optimizations of, please let us know!!

tophf commented 5 years ago

TL;DR We need the fast bulk read/write operations.

As I said, we could definitely consider API or spec tweaks if a customer with a real app would be willing to work with us on identifying the bottlenecks with the current solution under such real-world circumstances

I just couldn't believe my eyes. This is so weird. You sound like you're treating kv-storage spec as your own product, but it's not. Unlike a product you develop, specifications rarely change so they must be designed to account for potential use cases beforehand. Try to look at the API objectively and you'll see it's arbitrarily limited based on an opinion and a limited set of use cases, not on a really vast research.

Just because there are some sites or apps that can use the current API of kv-storage without a heavy hit to their performance doesn't mean the API is healthy or good for the web in general. Currently the API is worse than IndexedDB in terms of flexibility (and speed - in case of bulk operations) and worse than HTML5 localStorage in terms of maximum raw speed, latency/overhead and simplicity of use if we disregard the initial load where the browser may do the potentially unneeded work of reading localStorage, but that's also an edge case. Let's face it, kv-storage got the attention only because it was championed by Chromium devs, and there are better wrappers for IndexedDB already, or better ones could be written. Facing it, let's humbly perform the real research, share the real data, and improve on the API.

domenic commented 5 years ago

Hey @tophf! Specifications change pretty frequently, actually, and changes can get shipped to users in less than 6 weeks. It's important to avoid waterfall-style development-without-testing; we shouldn't spec something without any evidence that it helps in the real world (not just synthetic benchmarks).

Are you using a site that would benefit from fast bulk read/write operations? Would you be willing to work with us on testing evolutions of the API in that direction?

The API is definitely more limited than IndexedDB, but that is by design. You can always fall back to IndexedDB, operating on the same underlying store, by using the backingStore property. Indeed, you could "prolyfill" the proposed API additions, use them on your site, and report back how much of a difference they made---that would be invaluable to advancing the standards process. Are you willing to do that?

We would indeed love to perform the research, share the data, and improve the API. I think we're on the same page there. Would you, or any other customer with a real app wanting to use KV storage, be up for helping with that?

tophf commented 5 years ago

My point here was it's weird that you make guesses at such an early stage and imply the simplified API is enough based on a few success stories ~and an ancient article from 2012 that berates localStorage because of a few edge cases which aren't as bad on a modern hardware~ ^{(er, that's not the weird part I was referring to)}. AFAIK it's always the proponents of a proposal who should investigate the real world usage proactively and present the findings. Otherwise it's unclear why we need yet another IndexedDB wrapper when we already have quite a few. A random idea: maybe you could showcase the benefits (before -> after) in a simulation like https://browserbench.org/Speedometer2.0/

domenic commented 5 years ago

Well, this may just be a difference in working style, but we have to start somewhere, and starting minimal is generally a good strategy when trying to move a large ecosystem like the web. We then expand onward from there in collaboration with customers.

We've indeed shown some benefits with some customers so far (real world sites, not synthetic benchmarks like speedometer). They are all based on decreasing the bundle size and startup time by a small-to-moderate amount by removing the need to ship the IndexedDB wrapper code with the app, and parse/compile/optimize it on startup.

tophf commented 5 years ago

I'm arguing you're starting unreasonably small without proving it's enough. My primary concern is that it'll impede further improvements due to the unwillingness to introduce potentially breaking changes. Iterating and experimenting is good for an in-house project or for a draft of a spec, but not for a candidate recommendation to be implemented by browser vendors and used web-wide. BTW the case you've just mentioned - tiny apps don't need to include a relatively big IndexedDB wrapper - is a good one to mention explicitly in the explainer, preferably with the real measurements of various wrappers.

domenic commented 5 years ago

I don't see how any of the suggested additions would be breaking changes.

We tried to touch on this in the explainer at https://github.com/WICG/kv-storage#impact.

tophf commented 5 years ago

My suggestion (linked above) to extend get and set the way it's done in WebExtensions storage API may be a breaking change. It's not necessarily a good solution, but it'll be too late to even consider it once the API is widely used. Also, I'm not prescient so I can't agree that none of the suggestions, especially those that weren't yet mentioned in this repo, aren't likely to introduce a breaking change, provided the vendors agree on the current limited API.

tophf commented 5 years ago

I'd like to reiterate to emphasize the fact that current API promotes a very bad practice - storing one big state object instead of lots of small objects (more info: Best Practices for Using IndexedDB) - in the absence of the bulk read/write methods for a specified set of keys and/or getAllKeys/getAll to get everything in one fell swoop that would have allowed to process the data in just a few milliseconds instead of ridiculous durations like 100ms caused by running separate event loop task for each IDB request.

The fact that the few proponents of the API proposal didn't encounter such a case personally doesn't mean it's a negligibly rare case. One would have to be clairvoyant to claim that without a proper research. I think it is an important use case that's either already used sufficiently in the wild to be unconditionally included in the API or should be used and even promoted in the new API by explaining in the documentation why and when a bulk operation is needed.

dmurph commented 5 years ago

I want to voice my appreciation for your analysis @esprehn. Thanks for showing the problem so well.