Open jgravelle-google opened 5 years ago
One idea I've been pondering is to have a single, fairly-general memory-to-sequence
lifting instruction that takes a function immediate with signature: [i32]→[T,i32,bool]
where: the incoming i32
is the pointer to the element which is to be turned into a T
element (for a seq<T>
), the bool
result indicates whether there is a next element and, if so, the i32
result is a pointer to it.
For a std::vector<uint8_t>
this would look like (using Nick's wasm
idea, and pulling in let
from function-references) [Edit: take out explicit wasm...end
]:
(@interface func (export ...) (param $vec i32) (result (seq u8))
(i32.load offset=4 (local.get $vec)) ;; vec.end()
let (local $end i32)
(i32.load offset=0 (local.get $vec)) ;; vec.begin()
memory-to-sequence (func (param $p i32) (result u8 i32 bool)
(lift-int i32 u8 (i32.load8_u (local.get $p))) ;; *p
(i32.add (local.get $p) (i32.const 1)) ;; p = p + sizeof(uint8_t)
(lift-bool (i32.eq (dup) (local.get $end))) ;; (p == end)
)
end
)
Inlining the lambda function into the memory-to-sequence
instruction's implied for loop, this adapter could be compiled fairly easily into a pretty-efficient loop in machine code. Optimizing this further into a memcpy
would involve some fairly simple analysis, but it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy
.
The neat thing though is that the same memory-to-sequence
+lambda design would allow mapping a std::list<uint8_t>
to a sequence just as easily [Edit: take out explicit wasm...end
]:
(@interface func (export ...) (param $list i32) (result (seq u8))
(i32.load offset=0 (local.get $list)) ;; list.begin()
memory-to-sequence (func (param $p i32) (result u8 i32 bool)
(lift-int i32 u8 (i32.load8_u (local.get $p))) ;; *p
(i32.load offset=4 (local.get $p)) ;; p->next
(lift-bool (i32.eq (dup) (i32.const 0))) ;; (p == nullptr)
)
)
Lowering is a separate topic, but WDYT about this lifting?
All those wasm .. end sequences are pretty clumsy. As an alternate to an embedded lambda it would perhaps be more in keeping with wasm to have special looping blocks. This is the approach I have been toying with in the scenarios. That way you can make invoking a helper to its own thing. The lifting and lowering constructs need to be designed in tandem -- because they have to be rewritten in tandem. It does look like we are all converging on something though...
Francis
On Tue, Sep 24, 2019 at 10:12 AM Luke Wagner notifications@github.com wrote:
One idea I've been pondering is to have a single, fairly-general memory-to-sequence lifting instruction that takes a function immediate with signature: [i32]→[T,i32,bool] where: the incoming i32 is the pointer to the element which is to be turned into a T element (for a seq
), the bool result indicates whether there is a next element and, if so, the i32 result is a pointer to it. For a std::vector
this would look like (using Nick's wasm idea https://github.com/WebAssembly/interface-types/issues/67, and pulling in let from function-references https://github.com/WebAssembly/function-references/blob/master/proposals/function-references/Overview.md#local-bindings ): (@interface func (export ...) (param $vec i32) (result (seq u8))
wasm
(i32.load offset=4 (local.get $vec)) ;; vec.end()
end
let (local $end i32)
wasm (i32.load offset=0 (local.get $vec)) ;; vec.begin() end memory-to-sequence (func (param $p i32) (result u8 i32 bool) wasm (i32.load8_u (local.get $p)) ;; *p end lift-int i32 u8 wasm (i32.add (local.get $p) (i32.const 1)) ;; p = p + sizeof(uint8_t) (i32.eq (dup) (local.get $end)) ;; (p == end) end lift-bool )
end
)
Inlining the lambda function into the memory-to-sequence instruction's implied for loop, this adapter could be compiled fairly easily into a pretty-efficient loop in machine code. Optimizing this further into a memcpy would involve some fairly simple analysis, but it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy.
The neat thing though is that the same memory-to-sequence+lambda design would allow mapping a std::list
to a sequence just as easily: (@interface func (export ...) (param $list i32) (result (seq u8))
wasm
(i32.load offset=0 (local.get $list)) ;; list.begin()
end
memory-to-sequence (func (param $p i32) (result u8 i32 bool)
wasm (i32.load8_u (local.get $p)) ;; *p end lift-int i32 u8 wasm (i32.load offset=4 (local.get $p)) ;; p->next (i32.eq (dup) (i32.const 0)) ;; (p == nullptr) end lift-bool
)
)
Lowering is a separate topic, but WDYT about this lifting?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUHSFUHMXFKXBRCMGNTQLJDA7A5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PDNYA#issuecomment-534656736, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUH2COGLFYTUMCQ3K4DQLJDA7ANCNFSM4IZWL2YA .
-- Francis McCabe SWE
Yeah, the wasm ... end
blocks aren't especially beautiful; the alternative is simply to allow the wasm instructions inline without the surrounding block; the high-order bit, I think, is not redefining core wasm instructions to be similar-but-slightly-different.
So for the looping block idea: is the idea that the memory-to-sequence
instruction itself defines the block? That could certainly work, I think. I liked reusing the lambdas b/c I think we'd need them for independent reasons, but perhaps the essential difference here is how they are treated by the rewrite rules; lifting/lowering bodies get rewritten/fused; allocator functions don't.
Agreed that lifting/lowering have to be designed in tandem, but I was trying to keep the post at least somewhat byte-sized and digestible.
What I wound up doing for the wasm instructions in my polyfill was just reimplement them as add
and load
. Not elegant either but was the quickest thing to do.
We could also potentially have the wasm ... end
blocks be an encoding detail, where for any instr that's part of the subset we support, we duplicate the text format, and then encode it in the binary as some start_wasm
sentinel.
Ok that's compelling enough that I'll take it to that thread.
it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy
Yeah, in the same way that string-to-mem + mem-to-string is very clearly optimizable to memcpy (ish), seq-to-mem + mem-to-seq would be a way of making it clear that that works. We just also need a more general primitive. Which order we ship them in is less important so long as we design both to the point where we're confident.
I was trying to keep the post at least somewhat byte-sized and digestible.
This sounds like a great document for the working notes :D Which I'd thought about doing this as, but figured we should get something like consensus first. It sounds like we have some vaguely converging ideas though. But yeah I predict we will wind up with a note document with something like 5-10 examples here and that should help.
A question I have is, are there other data structures that should map to sequences?
Stacks and queues should, but their internal memory layout is (probably) similar to a vector. (Hash)Maps are a collection, but need to store a key associated with each element.
Do we want to pass sets across an interface? Does it make sense to convert from set to (say) vector?
If yes, the same principles apply as a linked list, where we may not expect things to be fast and can always fall back to constructing a value by repeatedly calling an add_elem
export. A sample implementation of a set could model the data as a tree. If the tree is stored as explicitly-modeled nodes with pointers to their children, the data is spread out and we can't expect any magic. If the tree is stored as implicitly-modeled nodes, then we may be able to read the data out (in some order) with a contiguous memcpy, but in order to maintain the tree invariants if we write data back in, we need to call exports.
Writing this out, I think the question of "should we let people publish an API that takes collections and call it with the wrong kind of thing?" is answered with "how could we even stop them in theory?"
Are there other ways of laying out data in memory aside from contiguously, or spread out with pointers?
We kind of don't/should not care about non-sequence sequences. Francis
On Wed, Sep 25, 2019 at 10:07 AM Jacob Gravelle notifications@github.com wrote:
A question I have is, are there other data structures that should map to sequences?
Stacks and queues should, but their internal memory layout is (probably) similar to a vector. (Hash)Maps are a collection, but need to store a key associated with each element.
Do we want to pass sets across an interface? Does it make sense to convert from set to (say) vector? If yes, the same principles apply as a linked list, where we may not expect things to be fast and can always fall back to constructing a value by repeatedly calling an add_elem export. A sample implementation of a set could model the data as a tree. If the tree is stored as explicitly-modeled nodes with pointers to their children, the data is spread out and we can't expect any magic. If the tree is stored as implicitly-modeled nodes https://slideplayer.com/slide/5107901/16/images/3/Binary+Heap+Review+%E2%80%A6+%E2%80%A6+Implicit+%3A+stored+in+array+with+no+pointers.jpg, then we may be able to read the data out (in some order) with a contiguous memcpy, but in order to maintain the tree invariants if we write data back in, we need to call exports.
Writing this out, I think the question of "should we let people publish an API that takes collections and call it with the wrong kind of thing?" is answered with "how could we even stop them in theory?"
Are there other ways of laying out data in memory aside from contiguously, or spread out with pointers?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUCK262ZZ3ZBPV2ZYGTQLOLDFA5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SUI6Y#issuecomment-535118971, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUEEBUDFAFLWLSYDYRDQLOLDFANCNFSM4IZWL2YA .
-- Francis McCabe SWE
Right, my point there is not that we need to add anything for non-sequential collections, but that the primitives we have are already enough for people to sequence their non-sequential data. We can (and should) continue to not care about them, but people will use them, and they will mostly work.
A related question is what are the essential properties of a sequence. Well-ordered is probably one. Isomorphic across representations could be one? i.e. for all X,Y where X is a type of sequence and Y is a type of sequence, for all n where n is a collection of values, then X(Y(X(n))) == X(n). For example let n = [1, 2, 2, 3], X = Vector, Y = List, then Vector(List(Vector([1, 2, 2, 3]))) == Vector([1, 2, 2, 3]), whereas if Y = Set then we would drop the duplicated 2 so the isomorphism fails to hold. I don't think we have any mechanism to disallow a user from mapping a set to a sequence. But we don't have to care about what happens in that case.
Sequence
There is a related question though: what about supporting arbitrary generic collection types. IMO this is doable simply by an elaboration of our current thinking for record types: with the addition of being able to plug-in adapters.
E.g., if we had a type:
CCTransaction
where CCTransaction is some type that models a credit card transaction, and Item is something that one might purchase with a credit card, then a. any actual transaction will have a type that grounds the Item type variable. (This does rule out some higher-order patterns) b. where Item shows up in the representation of a CCTransaction value, instead of a fixed lifting/lowering there would have to be code that lifted/lowered the actual transaction information.
On Wed, Sep 25, 2019 at 10:24 AM Jacob Gravelle notifications@github.com wrote:
Right, my point there is not that we need to add anything for non-sequential collections, but that the primitives we have are already enough for people to sequence their non-sequential data. We can (and should) continue to not care about them, but people will use them, and they will mostly work.
A related question is what are the essential properties of a sequence. Well-ordered is probably one. Isomorphic across representations could be one? i.e. for all X,Y where X is a type of sequence and Y is a type of sequence, for all n where n is a collection of values, then X(Y(X(n))) == X(n). For example let n = [1, 2, 2, 3], X = Vector, Y = List, then Vector(List(Vector([1, 2, 2, 3]))) == Vector([1, 2, 2, 3]), whereas if Y = Set then we would drop the duplicated 2 so the isomorphism fails to hold. I don't think we have any mechanism to disallow a user from mapping a set to a sequence. But we don't have to care about what happens in that case.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUHD53YFMF3VBQ5IEUDQLONFPA5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SV7KQ#issuecomment-535125930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUGFMJVAWHOXE6LVLULQLONFPANCNFSM4IZWL2YA .
-- Francis McCabe SWE
Here's a dump of what I've been thinking about sequences so far. This is informed generally by the experimenting I've been doing in my polyfill/interpreter. This isn't so much an "I have answers" so much as it is a "here's why I'm thinking this is tricky"
Core use cases (in order of importance)
Sample sequences
C++ alone has three immediate options:
T*
buffer (plus length separately)std::vector<T>
std::list<T>
Assume vector is implemented as a struct with a pointer to a buffer, a length, and a current capacity. Assume list is implemented as a doubly-linked list, with pointers to the front and back.
I'm using C++ here because it can represent this in a variety of ways. For any given language I expect we only need to idiomatically support a single style. So for C++ we can have compiler support to autogenerate buffer adapters, but not linked lists, whereas in Scheme we could natively support linked lists but not necessarily anything else.
Complications
Assume we model a buffer of arbitray bytes as to list and vice-versa.
The vector case is surprisingly tricky to handle. On the one hand we can reuse the buffer-copying logic to marshal the actual data over. On the other hand, we need to maintain the vector invariants as we go (length, capacity). We either need to use a fold primitive and call in to the C++ code to push_back every element (and let C++ maintain the invariants), use the mem-to-seq instructions and fix up the invariants at the end, or move the invariant-maintaining logic in to the body of the folds. The only one of those that is amenable to optimization is to defer fixing up the internal state. I can't quite imagine how to express that without hand-writing the adapter, however (although for std::vector, as part of the standard library, we are technically able to do so).
seq<u8>
. To support a maximally-efficient copy, we need to have primitives that can be optimized into a single memcpy. For the C-style simple buffer, we can do something similar tomemory-to-string
/string-to-memory
, which takes ptr + length and produces a seq, and vice-versa (open question: does this handle only u8, or is it parameterized, and what type is it parameterized on, interface types or wasm types?). For the linked list case, this is fundamentally impossible (and that's ok). We do need to be able to build a sequence from a linked list, so we will at minimum need another pair of primitives to do so. In my polyfill I triedfold-seq
andrepeat-until
(using helper functions), to go from seqFor simplicity of specification it may be nice to just have a single kind of general primitive, some form of fold or loop, and have clear rules for where it is optimized away into a memcpy. It is also probably better to start with the more general primitive, and add alternative more-optimal ones later on.