WICG / kv-storage

[On hold] A proposal for an async key/value storage API for the web
Other
550 stars 18 forks source link

Restrict allowed key types #39

Open pwnall opened 5 years ago

pwnall commented 5 years ago

IndexedDB supports a fairly large set of key types, and comparing between them isn't very intuitive. How about supporting an explicit subset?

Proposal:

The ordering has to remain the same as in IndexedDB, if we want the option of async iterators over keys / values.

Dates can be serialized to numbers when passed in -- I think this would be consistent with WebIDL.

The main missing element here is Array objects. This does preclude some nice use cases, but I claim those are more advanced. If I'm wrong, we can always expand the set of supported types, whereas it's harder / impossible to narrow it.

domenic commented 5 years ago

Previous discussion in #2. To be clear, you are only talking about restricting keys, not values?

One thing I'm missing is, what are the advantages of restricting?

pwnall commented 5 years ago

Sorry, I have no idea how I missed #2. Would you prefer that I close this and comment there?

I haven't opened an issue for values yet because I'm still thinking through that. I think we can mostly discuss keys and values separately.

I'd like to discuss this for two reasons:

1) Databases (especially Chrome's backing store) perform better with small (<= thousands of bytes) keys. Arrays invite larger keys. 2) I'd prefer a simpler conceptual model to a more complicated one, and fewer key / value types help there, IMHO

If we adopt all types right away, I'd prefer that we have good use cases. I can see reasons for number / string keys right away. I can see raw buffers used to squeeze more storage out of the DB.

pwnall commented 5 years ago

Having written the above, I'm starting to think that I'd rather that we limit key sizes instead of types. What are your thoughts on the two axes?

domenic commented 5 years ago

No worries, a new fresh issue seems like a good idea in this case.

I'm not a heavy IDB user, but array keys seem really convenient for "tuple"-keyed storage. I'd love to get more feedback, either from users or from implementers who have worked with IDB users, to either substantiate or refute my intuition here. I agree that we could start out small though and expand upon encountering demand.

Restricting sizes seems pretty reasonable, but is there an easy way to determine key size? Maybe what we want here is a new option for IDB databases which allows them to reject all keys over a certain size? Is key size computation interoperable?

pwnall commented 5 years ago

Answering the last question: key serialization is implementation-dependent. To be consistent across browsers, we'd have to define some cost model and use that.

We can have a simple and efficient model as long as we leave Arrays out -- numbers would be in, strings and native arrays would have caps on their lengths / buffer sizes. Once we add Arrays, we'd have to come up with a full cost model, so the cost of an Array would be the sum of its elements + fixed overhead.

This is additional complexity for the implementation and spec. As a high-level API, I think it's worth taking up the extra complexity in order to keep developers on a "happy" path.

vweevers commented 5 years ago

I'm not sure how relevant this is, but array keys could be useful for compound indexes. For example [author, publisher] when storing books. Then, through key ranges you can answer queries like "give me all books by Roald Dahl", e.g. greater than ["Roald Dahl", ""].

I haven't used IndexedDB exactly like this, because my background is in Level (and level-js which wraps IndexedDB) where things like this are usually handled on a higher level - IndexedDB only sees string or binary keys. That said, compound keys are very powerful conceptually.

I do agree with:

IndexedDB supports a fairly large set of key types, and comparing between them isn't very intuitive

The approach we took in level-js is: you can use any key type you want. If you want to normalize types (e.g. serialize numbers to strings) or avoid the cost of the structured clone algorithm, then we recommend to wrap level-js with encoding-down. Array keys can then be encoded into binary keys with bytewise for example, which has a more intuitive sorting order. This encoding has a performance cost too, which might be an acceptable trade-off for a given app. So TLDR; we leave the choice of key type to the user.

asutherland commented 5 years ago

I think any sizing logic would want to go in the HTML standard with the StructuredSerialize algorithm and friends. This could be quite useful in other specs like the Notifiations API which uses the algorithm for Notification.data but does not impose any size limits. As APIs continue to add more spaces to stash user data, it could be useful to bound the data size for implementation sanity and to limit the impacts of broken or malicious sites.

If we did this, perhaps it would make sense to also spec the key size limit for IndexedDB as an immutable limitation from when a database is first opened. This would avoid edge-cases related to direct manipulation of the backing store, and potentially let IndexedDB implementations do some optimizations. Without this, we'd want to spec whether "illegal" key/value pairs are hidden from the async-local-storage API, etc.

Edit: Note that I had made the above comments previously as two separate comments that it looked like Github had eaten due to bad hotel wi-fi. They just appeared when I submitted this comment, and so I've deleted them since their thoughts have been superseded.

tophf commented 5 years ago

A use case for Uint8Array keys: encoding string keys in UTF8 manually via TextEncoder/TextDecoder to reduce the disk size of the db because at least in Chrome the string keys are always stored in UCS2 (two bytes per character), unlike values which get automatically compressed into one byte ASCII when possible.

asutherland commented 5 years ago

While it's unfortunately not unusual to optimize to a specific implementation, altering API design due to a single browser's implementation decisions runs counter to the point of web standards and requiring multiple implementations.

tophf commented 5 years ago

I wasn't suggesting anything, nothing needs to be changed AFAICT, it was just an informational side note confirming the proposed inclusion of Uint8Array.