Bulk read/write API - Githubissues

tophf commented 5 years ago

The current kv-storage API doesn't provide a way to read/write multiple values in one transaction so it's ultraslow when working with a lot of values. This is a known problem of IndexedDB design. We have to perform all the individual operations inside one IndexedDB transaction's IDBObjectStore (or IDBIndex) and resolve the outer Promise in the last IDBRequest. For example, in Chrome the difference could be 2000ms of separate transactions for each put vs 200ms of a single transaction with many put inside when reading/enumerating/writing a few thousand of simple objects. HTML5 localStorage would be even faster, close to 20ms.

Even the intentionally primitive storage API of WebExtensions/chrome-extensions has set({key1: obj1, key2: obj2}) to create a separate record for each key:value pair and get(['key1', 'key2']) or get({key1: default1, key2: default2}) to read separate keys in one fast internal operation.

It should be noted that enumerating of keys/values based on IndexedDB cursor is also 10 times slower at least in Chrome due to running each internal onsuccess callback in a separate task of the event loop. So in addition to keys(), values(), and entries() it would make sense to expose allKeys(), allValues(), that correspond to IndexedDB's getAllKeys, getAll. As for allEntries() it doesn't have a single-op implementation but it might make sense to perform getAllKeys + getAll and combine the result.

While the simplicity of the API is important, but my point is, without these optimized methods kv-storage is somewhat of a toy for really simple cases that can't replace neither IndexedDB nor even HTML5 localStorage where low-latency high-efficiency processing of many entries is required.

kv-storage with 1k records: 750ms + 250ms

```js import { storage } from 'std:kv-storage'; // specifier prefix not final (async () => { console.time(1); for (let i = 0; i < 1000; i++) await storage.set('mycat' + i, 'Tom'); console.timeEnd(1); console.time(2); for await (const [key, value] of storage.entries()) {} console.timeEnd(2); await storage.clear(); })(); ```

IndexedDB bulk ops with 1k records: 55ms + 80ms (3-13 times faster) and 10-40ms for getAllKeys+getAll

```js (async () => { const objects = Object.assign({}, ...Array(1000).fill(0).map((_, i) => ({['foo' + i]: 'bar'}))); console.time('putMany'); const idb = await openDb(); await putMany(objects); console.timeEnd('putMany'); console.time('getMany'); let store = openStore(); const data = await getMany(Object.keys(objects)); console.timeEnd('getMany'); console.time('all'); store = openStore(); const [keys, values] = await Promise.all([ new Promise(resolve => { store.getAllKeys().onsuccess = e => resolve(e.target.result); }), new Promise(resolve => { store.getAll().onsuccess = e => resolve(e.target.result); }) ]); console.timeEnd('all'); await clear(); /** @return Promise */ function openDb() { return new Promise(resolve => Object.assign(indexedDB.open('db'), { onsuccess: e => resolve(e.target.result), onupgradeneeded: e => e.target.result.createObjectStore('store'), })); } /** @return IDBObjectStore */ function openStore(mode = 'readonly') { return idb.transaction('store', mode).objectStore('store'); } function putMany(data) { const store = openStore('readwrite'); let op; for (const [k, v] of Object.entries(data)) op = store.put(v, k); return new Promise(resolve => { op.onsuccess = resolve; }); } function getMany(keys) { return new Promise(resolve => { const store = openStore(); const data = {}; let op; const last = keys.length - 1; for (let i = 0; i <= last; i++) { const key = keys[i]; store.get(key).onsuccess = e => { data[key] = e.target.result; if (i === last) resolve(data); } } }); } function clear() { return new Promise(resolve => { openStore('readwrite').clear().onsuccess = resolve; }); } })(); ```

HTML localStorage with 1k records: 10ms + 5ms (50-75 times faster)

```js (() => { console.time(5); for (let i = 0; i < 1000; i++) localStorage['mycat' + i] = 'Tom'; console.timeEnd(5); console.time(6); for (const key in localStorage) { if (Object.hasOwnProperty.call(localStorage, key)) { let value = localStorage[key]; } } console.timeEnd(6); //localStorage.clear(); })(); ```

domenic commented 5 years ago

This discussion is a bit split across here and https://github.com/WICG/kv-storage/issues/57#issuecomment-486079837, but I'll repeat from there that the main tradeoff for adding new features is ensuring that they benefit real-world apps.

It's very clear that KV storage, being only a subset of IndexedDB by design, is less capable and doesn't have bulk operations at this time. The question is, how much of a problem is that for real-world apps? Are there real-world apps that want to perform simple key-value storage operations, but also perform them so often and in such volume, that the current API causes noticeable user-facing slowdowns? Our conjecture was that apps that need that kind of transaction volume would rather use IndexedDB directly, or would face that transaction volume in rare-enough instances that dropping down and using the backingStore getter would be sufficient. We thought this was pretty clear from the evidence of the various IDB wrapper libraries that are popular today, and don't support batched read/write in this way.

But we might be wrong! Do you have a site that's using, e.g., a different IndexedDB wrapper library today, and wants to move to KV storage but can't because that'd cause user-noticeable performance problems from de-batching? That would be invaluable information for evolving the API, and judging whether increasing the complexity and implementation cost is worth it because of how it makes the web better.

I'll also note that it's important to focus on the problem statement, and the solution might not look exactly like changing the web-developer-facing API. For example, in https://github.com/WICG/kv-storage/issues/57#issuecomment-478083338 Firefox developers suggest that one path forward might be took the same web-dev-facing API, but optimize it so that it operates "as fast as" batch operations. But, all that talk is premature without having some examples of actual customers (not synthetic benchmarks) who would benefit from such an upgrade.

tophf commented 5 years ago

The synthetic benchmark above is just an extract of what an app is doing on update, actually reduced since the original could take more than 2 seconds. As the experience shows (Best Practices for Using IndexedDB), instead of storing a huge state object as a single entry apps should break it down into small atomic things. While one could argue such activities are rare, but it doesn't mean the users should wait for seconds instead of milliseconds. Even the much more primitive WebExtensions/chrome-extensions API provides a way to perform bulk operations.

domenic commented 5 years ago

Ah, great! Could you point me to the app?

tophf commented 5 years ago

It's a chrome extension I'm playing with - a 4k array of rule objects (built from a json on update) should be loaded each time the background script resumes after being unloaded - that is almost on every tab navigation. The array is inside IndexedDB, each rule a separate entry. I'm using getAllKeys to quickly get 4000 urls in just 15ms, find the matching ones, then read the full objects for each one. There must be lots of similar use cases - in advanced [web] apps like editors or games that need to load/save the state.

domenic commented 5 years ago

Ah, hmm, I see. I'm not sure the web standards process is the best way to add APIs for Chrome extensions; asking Gecko and WebKit to implement something, which would only be used in Chrome, is not great form. Generally, it's best to add Chrome extension APIs through the Chrome extension API process, not the web platform standards process.

Still, I believe you that it's possible there's web apps that might benefit from this. Let's keep this open to track the feature request, in the hopes of finding a web app to work with on this sort of feature addition with us.

Maybe one useful thing to ask, is are there other parts of the IndexedDB API you are using (e.g. indices, transactions, multiple separate object stores, ...), or are you just using it as a key/value store?

tophf commented 5 years ago

I'm not sure the web standards process is the best way to add APIs for Chrome extensions

The features of IndexedDB I'm using aren't specific to extensions API so it's not pertinent to the message I was trying to convey: I believe there are lots of web apps that [will] need to perform bulk operations. I've mentioned the set/get methods of extensions API just to show a possible way to expose the putMany/getMany functionality.

other parts of the IndexedDB API

I'm using an additional index indeed and a second store, although it's not crucial - just a minor touch.

varna commented 5 years ago

We made a monstrosity which could be a great example of a "large scale application" in real life:

It works offline;
We only save "updates" in our database;
It's controlled by state manager;
After update we put whole database state object into localStorage;
When user returns, we take database state object from localStorage and pop it into state manager;
It worked;
We reached 5MB size limit for some clients;
We didn't have time to invest into indexedDB;
I found out about kv:storage, used it as a drop in replacement, fixed a couple of bugs (added few awaits);
It works.

Averegish~ how long it takes. But we don't even care, because we don't even wait for it to finish. State manager handles it. No visible lag. Everyone is happy.

I probably sound like an asshole (I probably am), but this is how stuff works. We need to deliver good results and these results are good enough because UI handles it.

I'm not sure what you are trying to achieve with kv:storage, but it looks like another library that wraps indexedDB with API that looks like localStorage. If you are creating a standard, wouldn't it be more logical to create something like NoSQL database with offline support? Other libraries could extend it, but they can't create low level APIs like those, they must use them to achieve something.

WICG / kv-storage

Bulk read/write API #63