WebAssembly / wasi-keyvalue

37 stars 13 forks source link

Semantics of `atomic.increment` with other methods; value type system #51

Open pchickey opened 1 week ago

pchickey commented 1 week ago

dIn all operations on a keyvalue store, the type of a value is given by list<u8>. atomic.increment is the one exception where the value is passed in as an s64 and returned as an s64. With increment I can create a new s64 value for a key not in the store, modify by adding an s64, and retrieve the current s64 if I pass in a delta of 0.

The spec for atomic.increment needs to describe how the value associated with this key interacts with all of the other operations on the store. This is tricky because many underlying implementations of a store have a more complicated type system than representing every value as just a binary blob list<u8>, and only make operations like atomic addition available on types that were stored as numbers, rather than as blobs.

One possible path for the spec is, if the value in bucket.set to a byte sequence which, as ascii, parses to a base 10 integer value (optional leading -, then up to some limit of digits in 0-9) and contains no trailing symbols, the value is considered an numeric value instead of a binary blob. A bucket.get of the value gives the bytes of the ascii base-10 integer. (If you dont like ascii base 10, you could pick some other encoding of integers here, be it a sequence of 8 bytes with a little-endian 2s complement integer, or EB128 used in the wasm binary format, or whatever else you can dream up - its a big open ended problem). And now atomic.increment works on that value, because its a number value, right? You probably want an error variant now for when you use increment on a non-number value too.

This general path for solving the problem has introduced a phantom datatype system for values - it exists in the semantics of the interface, but it doesn't show up in the syntax.

So then say we introduce some common denominator datatype system that many of these underlying store implementations already have at the keyvalue interface. For example, we could define variant value { string(string), blob(list<u8>), number(s64) } and use that in place of all the list<u8>s in the set/get methods. This has a lot of advantages - now its very obvious what values are numbers versus which are binary blobs, and it means numbers get to be passed across the interface in an encoding defined by the component model, rather than being solved specifically for this proposal. It also puts this spec in a position where we have to choose our datatype system very carefully to have consistent semantics on the wide range of possible underlying common stores - we already have to do that for at least numbers in order to support increment properly, but now we have the opportunity to support storing other types as well, strings being the most obvious.

I think this is a pretty major design problem in the current spec of keyvalue. I'd like to hear what other implementors have done in order to implement atomic.increment so far, and what sort of discussion, if any, there has been to date about introducing a value datatype system.

Mossaka commented 1 week ago

I like the idea of a value type system you brought up in this issue, which obviously adds more complexity to the implementation. I am also curious to hear what implementors thoughts on this.

CC @thomastaylor312 , @devigned

devigned commented 1 week ago

Recently, I have been implementing atomic.increment for Spin. I'm not sure we can safely implement increment in its current shape without bleeding internal implementation details out to the consumer of the interface.

For example, we do not separate key space between Vec<u8> and numeric key / values, so there is possibility of trying to increment a key that was a bin blob. This might be ok if the consumer were to choose the same bin format for that integer, but it's likely it will not be the case.

I'm also not sure we need to use variant value { string(string), blob(list<u8>), number(u64) }. There are aspects of this I do like, but I think we might be able to safely use the current interface if we were to implement K/V stores by wrapping the incoming value with some additional metadata.

For example, if we were to wrap all values with a metadata wrapper like the following { value: Vec<u8>, origin: Origin(integer | blob) } prior to storing in the backend store. This way we could operate on increment with a clear understanding that they stored value is going to be an integer stored in format we expect. The down side to this is that we would not be able to use https://redis.io/docs/latest/commands/incr/ since the value would be stored differently than how the backend store would expect it for increment.

I agree there is a problem. Let's figure out the best way to solve it.