WebAssembly / component-model

Repository for design and specification of the Component Model
Other
978 stars 82 forks source link

Dictionaries #125

Open guybedford opened 2 years ago

guybedford commented 2 years ago

Are dictionaries on the roadmap for the component model? At the moment it seems this requires representation via list<tuple<string, ...>>, where a dictionary could be a relatively straightforward host-level representation over such a datatype.

lukewagner commented 2 years ago

That's a good question and indeed this is one of those types on the edge. Some reasons why it seems like we should hold off for now:

Thus, as with #56, while I can see the use case, I'd like to see if we can get by without these in an MVP release.

guybedford commented 2 years ago

I would of course tend towards JS semantics here, so records having unique strings and ordering supported (ie exactly a list<tuple<string, ...>> representation underneath). In theory uniqueness could be managed by the host, in the same way valid USV strings are managed by the host without requiring a check everytime, since it's a strong property of the nature of the data it returns.

I can certainly get by with a list, but my concern is that it's a wide enough pattern that there's a natural temptation to want to decorate the interface with a JS object to give a nice representation - which then forces me back outside of the component model to a space of component model + sprinkles of JS, which seems non-ideal.

guybedford commented 2 years ago

I suppose records & tuples alignment in JS would imply unordered record objects these days, so perhaps sorting might be preferable, it's hard to know.

Either way, agreed as with #56 that this is nice-to-have but not necessarily MVP.

lukewagner commented 2 years ago

Agreed. And just explaining a bit more about the design challenges:

It makes sense that when passing in a host value, the host can already have guarantees of uniqueness; the complexity comes when the Canonical ABI lifts a dictionary from linear memory since the host can't trust anything and thus must verify things dynamically every time.

For ordering: requiring ordered keys (in the list<tuple<string, ...>>) is great for lifting performance (each string can be compared with its predecessor, fused with the copy loop) but JS objects and maps (and, in general, any hash-table-based dictionary) aren't lexicographically ordered, so this would add a non-trivial sorting operation when lowering these into wasm (negating any win from knowing uniqueness). But if we drop the ordering requirement, detecting dupes when lifting linear memory is rather more expensive.

Lastly, we could have dictionary<K,V> simply mean list<tuple<K,V>> and not impose any uniqueness/ordering requirements, which would achieve your goal of being able to derive a nice idiomatic JS interface automatically. But then the worry is that dupes end up leading to corner case bugs.

I'm not sure what the right answer is here, just that it'd be nice to punt on this in the MVP.

yordis commented 2 months ago

Could we take inspiration from Protobuf, while also Cap'n Proto seems to do list<tuple<string, ...>>

dzmitry-lahoda commented 19 hours ago

I did not made this text pretty with ai. Main idea is how to find types which are maybe_goodtoaddtowit.

There are https://en.wikipedia.org/wiki/Refinement_type and https://en.wikipedia.org/wiki/Dependent_type which ultimately allow to encode something like JSON Shema or XSD and prove things about composability. I have not seen further more typed way to define types. And exiting example https://www.jolie-lang.org/index.html .

What types/properties for these most modern languages can do up to some degree? It is ordered and unique.

type maybe_map = list<tuple<string, ...>> is not ordered no unique, nor there is a way to verify that. why? because parsing maybe_map is O(size), while verifying unique is O(size * log size) which is dos attack. That is https://protobuf.dev/programming-guides/proto3/#maps which is barely usable. What about O(const) of comparison?

Some people tend to reject idea of ordered to be any relevance to unique so https://www.ietf.org/archive/id/draft-devault-bare-11.html#section-2.2 , does not feels good. Specifically he asks to raise error if non unique, but that is dos and cannot be part of general standard.

So from this, may be need to think about ordered first, which allows unique.

We can have key ordered maps, which are not binding to list of tuples, because there is specific key value, and just ordered lists, basis for sets.

So what to do with set<tuple<MyType>>, which would imply that any primitive type as defined has canonical order, and each record has canonical order too. In general can define something. But not sure if I know all alignments. And is order ascending or descending? What order of fields in record used as key?

So having ordered can be hard, hence unique too.

So what can be done? I can document that my ``;

/// ordered
/// unique
type ordered_set<T> = list<T>

or I can make docs structured:

/// refinements:
///  - ordered
///  - unique
type ordered_set<T> = list<T>

or I can think of attributes

#[refinements(ordered, unique)]
type ordered_set<T> = list<T>

Is safe type subset possible? Possible safe subset which can be done is ascending ordered set where element is fixed primitive type or fixed size list or tuple of such types or when key is like element in set.

How to handle not fixed elements like strings? Some trustful(by "admin"/"sudo") operation can be initialized with something to be validate by O(n log n) to be set/map and stored. And after that set or map of index elements(index of stored element) to be send as key of map/set. So

So may having structured tags or custom attributes is way to have dictionary?

https://github.com/WebAssembly/component-model/issues/58