WebAssembly / interface-types

Other
640 stars 57 forks source link

Sequences #72

Open jgravelle-google opened 5 years ago

jgravelle-google commented 5 years ago

Here's a dump of what I've been thinking about sequences so far. This is informed generally by the experimenting I've been doing in my polyfill/interpreter. This isn't so much an "I have answers" so much as it is a "here's why I'm thinking this is tricky"

Core use cases (in order of importance)

  1. Passing buffers of bytes efficiently
    1. buffers of larger data elements should be efficient too
  2. Supporting collections of arbitrary data
    1. including structured data
    2. including non-scalar data (strings, and sequences)
  3. Translating data between different representations of an ordered list

Sample sequences

C++ alone has three immediate options:

Assume vector is implemented as a struct with a pointer to a buffer, a length, and a current capacity. Assume list is implemented as a doubly-linked list, with pointers to the front and back.

I'm using C++ here because it can represent this in a variety of ways. For any given language I expect we only need to idiomatically support a single style. So for C++ we can have compiler support to autogenerate buffer adapters, but not linked lists, whereas in Scheme we could natively support linked lists but not necessarily anything else.

Complications

Assume we model a buffer of arbitray bytes as seq<u8>. To support a maximally-efficient copy, we need to have primitives that can be optimized into a single memcpy. For the C-style simple buffer, we can do something similar to memory-to-string/string-to-memory, which takes ptr + length and produces a seq, and vice-versa (open question: does this handle only u8, or is it parameterized, and what type is it parameterized on, interface types or wasm types?). For the linked list case, this is fundamentally impossible (and that's ok). We do need to be able to build a sequence from a linked list, so we will at minimum need another pair of primitives to do so. In my polyfill I tried fold-seq and repeat-until (using helper functions), to go from seq to list and vice-versa. The vector case is surprisingly tricky to handle. On the one hand we can reuse the buffer-copying logic to marshal the actual data over. On the other hand, we need to maintain the vector invariants as we go (length, capacity). We either need to use a fold primitive and call in to the C++ code to push_back every element (and let C++ maintain the invariants), use the mem-to-seq instructions and fix up the invariants at the end, or move the invariant-maintaining logic in to the body of the folds. The only one of those that is amenable to optimization is to defer fixing up the internal state. I can't quite imagine how to express that without hand-writing the adapter, however (although for std::vector, as part of the standard library, we are technically able to do so).

For simplicity of specification it may be nice to just have a single kind of general primitive, some form of fold or loop, and have clear rules for where it is optimized away into a memcpy. It is also probably better to start with the more general primitive, and add alternative more-optimal ones later on.

lukewagner commented 5 years ago

One idea I've been pondering is to have a single, fairly-general memory-to-sequence lifting instruction that takes a function immediate with signature: [i32]→[T,i32,bool] where: the incoming i32 is the pointer to the element which is to be turned into a T element (for a seq<T>), the bool result indicates whether there is a next element and, if so, the i32 result is a pointer to it.

For a std::vector<uint8_t> this would look like (using Nick's wasm idea, and pulling in let from function-references) [Edit: take out explicit wasm...end]:

(@interface func (export ...) (param $vec i32) (result (seq u8))
  (i32.load offset=4 (local.get $vec))                ;; vec.end()
  let (local $end i32)
    (i32.load offset=0 (local.get $vec))              ;; vec.begin()
    memory-to-sequence (func (param $p i32) (result u8 i32 bool)
      (lift-int i32 u8 (i32.load8_u (local.get $p)))  ;; *p
      (i32.add (local.get $p) (i32.const 1))          ;; p = p + sizeof(uint8_t)
      (lift-bool (i32.eq (dup) (local.get $end)))     ;; (p == end)
    )
  end
)

Inlining the lambda function into the memory-to-sequence instruction's implied for loop, this adapter could be compiled fairly easily into a pretty-efficient loop in machine code. Optimizing this further into a memcpy would involve some fairly simple analysis, but it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy.

The neat thing though is that the same memory-to-sequence+lambda design would allow mapping a std::list<uint8_t> to a sequence just as easily [Edit: take out explicit wasm...end]:

(@interface func (export ...) (param $list i32) (result (seq u8))
  (i32.load offset=0 (local.get $list))                ;; list.begin()
  memory-to-sequence (func (param $p i32) (result u8 i32 bool)
    (lift-int i32 u8 (i32.load8_u (local.get $p)))     ;; *p
    (i32.load offset=4 (local.get $p))                 ;; p->next
    (lift-bool (i32.eq (dup) (i32.const 0)))           ;; (p == nullptr)
  )
)

Lowering is a separate topic, but WDYT about this lifting?

fgmccabe commented 5 years ago

All those wasm .. end sequences are pretty clumsy. As an alternate to an embedded lambda it would perhaps be more in keeping with wasm to have special looping blocks. This is the approach I have been toying with in the scenarios. That way you can make invoking a helper to its own thing. The lifting and lowering constructs need to be designed in tandem -- because they have to be rewritten in tandem. It does look like we are all converging on something though...

Francis

On Tue, Sep 24, 2019 at 10:12 AM Luke Wagner notifications@github.com wrote:

One idea I've been pondering is to have a single, fairly-general memory-to-sequence lifting instruction that takes a function immediate with signature: [i32]→[T,i32,bool] where: the incoming i32 is the pointer to the element which is to be turned into a T element (for a seq), the bool result indicates whether there is a next element and, if so, the i32 result is a pointer to it.

For a std::vector this would look like (using Nick's wasm idea https://github.com/WebAssembly/interface-types/issues/67, and pulling in let from function-references https://github.com/WebAssembly/function-references/blob/master/proposals/function-references/Overview.md#local-bindings ):

(@interface func (export ...) (param $vec i32) (result (seq u8))

wasm

(i32.load offset=4 (local.get $vec))  ;; vec.end()

end

let (local $end i32)

wasm

  (i32.load offset=0 (local.get $vec)) ;; vec.begin()

end

memory-to-sequence (func (param $p i32) (result u8 i32 bool)

  wasm

    (i32.load8_u (local.get $p))  ;; *p

  end

  lift-int i32 u8

  wasm

    (i32.add (local.get $p) (i32.const 1))  ;; p = p + sizeof(uint8_t)

    (i32.eq (dup) (local.get $end))  ;; (p == end)

  end

  lift-bool

)

end

)

Inlining the lambda function into the memory-to-sequence instruction's implied for loop, this adapter could be compiled fairly easily into a pretty-efficient loop in machine code. Optimizing this further into a memcpy would involve some fairly simple analysis, but it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy.

The neat thing though is that the same memory-to-sequence+lambda design would allow mapping a std::list to a sequence just as easily:

(@interface func (export ...) (param $list i32) (result (seq u8))

wasm

(i32.load offset=0 (local.get $list)) ;; list.begin()

end

memory-to-sequence (func (param $p i32) (result u8 i32 bool)

wasm

  (i32.load8_u (local.get $p))  ;; *p

end

lift-int i32 u8

wasm

  (i32.load offset=4 (local.get $p))  ;; p->next

  (i32.eq (dup) (i32.const 0))  ;; (p == nullptr)

end

lift-bool

)

)

Lowering is a separate topic, but WDYT about this lifting?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUHSFUHMXFKXBRCMGNTQLJDA7A5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PDNYA#issuecomment-534656736, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUH2COGLFYTUMCQ3K4DQLJDA7ANCNFSM4IZWL2YA .

-- Francis McCabe SWE

lukewagner commented 5 years ago

Yeah, the wasm ... end blocks aren't especially beautiful; the alternative is simply to allow the wasm instructions inline without the surrounding block; the high-order bit, I think, is not redefining core wasm instructions to be similar-but-slightly-different.

So for the looping block idea: is the idea that the memory-to-sequence instruction itself defines the block? That could certainly work, I think. I liked reusing the lambdas b/c I think we'd need them for independent reasons, but perhaps the essential difference here is how they are treated by the rewrite rules; lifting/lowering bodies get rewritten/fused; allocator functions don't.

Agreed that lifting/lowering have to be designed in tandem, but I was trying to keep the post at least somewhat byte-sized and digestible.

jgravelle-google commented 5 years ago

What I wound up doing for the wasm instructions in my polyfill was just reimplement them as add and load. Not elegant either but was the quickest thing to do. We could also potentially have the wasm ... end blocks be an encoding detail, where for any instr that's part of the subset we support, we duplicate the text format, and then encode it in the binary as some start_wasm sentinel. Ok that's compelling enough that I'll take it to that thread.

it's possible that this would feel arbitrary enough to motivate a more-specialized instruction that was guaranteed to map to a memcpy

Yeah, in the same way that string-to-mem + mem-to-string is very clearly optimizable to memcpy (ish), seq-to-mem + mem-to-seq would be a way of making it clear that that works. We just also need a more general primitive. Which order we ship them in is less important so long as we design both to the point where we're confident.

I was trying to keep the post at least somewhat byte-sized and digestible.

This sounds like a great document for the working notes :D Which I'd thought about doing this as, but figured we should get something like consensus first. It sounds like we have some vaguely converging ideas though. But yeah I predict we will wind up with a note document with something like 5-10 examples here and that should help.

jgravelle-google commented 5 years ago

A question I have is, are there other data structures that should map to sequences?

Stacks and queues should, but their internal memory layout is (probably) similar to a vector. (Hash)Maps are a collection, but need to store a key associated with each element.

Do we want to pass sets across an interface? Does it make sense to convert from set to (say) vector? If yes, the same principles apply as a linked list, where we may not expect things to be fast and can always fall back to constructing a value by repeatedly calling an add_elem export. A sample implementation of a set could model the data as a tree. If the tree is stored as explicitly-modeled nodes with pointers to their children, the data is spread out and we can't expect any magic. If the tree is stored as implicitly-modeled nodes, then we may be able to read the data out (in some order) with a contiguous memcpy, but in order to maintain the tree invariants if we write data back in, we need to call exports.

Writing this out, I think the question of "should we let people publish an API that takes collections and call it with the wrong kind of thing?" is answered with "how could we even stop them in theory?"

Are there other ways of laying out data in memory aside from contiguously, or spread out with pointers?

fgmccabe commented 5 years ago

We kind of don't/should not care about non-sequence sequences. Francis

On Wed, Sep 25, 2019 at 10:07 AM Jacob Gravelle notifications@github.com wrote:

A question I have is, are there other data structures that should map to sequences?

Stacks and queues should, but their internal memory layout is (probably) similar to a vector. (Hash)Maps are a collection, but need to store a key associated with each element.

Do we want to pass sets across an interface? Does it make sense to convert from set to (say) vector? If yes, the same principles apply as a linked list, where we may not expect things to be fast and can always fall back to constructing a value by repeatedly calling an add_elem export. A sample implementation of a set could model the data as a tree. If the tree is stored as explicitly-modeled nodes with pointers to their children, the data is spread out and we can't expect any magic. If the tree is stored as implicitly-modeled nodes https://slideplayer.com/slide/5107901/16/images/3/Binary+Heap+Review+%E2%80%A6+%E2%80%A6+Implicit+%3A+stored+in+array+with+no+pointers.jpg, then we may be able to read the data out (in some order) with a contiguous memcpy, but in order to maintain the tree invariants if we write data back in, we need to call exports.

Writing this out, I think the question of "should we let people publish an API that takes collections and call it with the wrong kind of thing?" is answered with "how could we even stop them in theory?"

Are there other ways of laying out data in memory aside from contiguously, or spread out with pointers?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUCK262ZZ3ZBPV2ZYGTQLOLDFA5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SUI6Y#issuecomment-535118971, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUEEBUDFAFLWLSYDYRDQLOLDFANCNFSM4IZWL2YA .

-- Francis McCabe SWE

jgravelle-google commented 5 years ago

Right, my point there is not that we need to add anything for non-sequential collections, but that the primitives we have are already enough for people to sequence their non-sequential data. We can (and should) continue to not care about them, but people will use them, and they will mostly work.

A related question is what are the essential properties of a sequence. Well-ordered is probably one. Isomorphic across representations could be one? i.e. for all X,Y where X is a type of sequence and Y is a type of sequence, for all n where n is a collection of values, then X(Y(X(n))) == X(n). For example let n = [1, 2, 2, 3], X = Vector, Y = List, then Vector(List(Vector([1, 2, 2, 3]))) == Vector([1, 2, 2, 3]), whereas if Y = Set then we would drop the duplicated 2 so the isomorphism fails to hold. I don't think we have any mechanism to disallow a user from mapping a set to a sequence. But we don't have to care about what happens in that case.

fgmccabe commented 5 years ago

Sequence is highly likely to be one of our standard types. For any given language whose compiler supports generating adapters there will be a 'language representative' for sequences. Other data structures in that language will likely need to be coerced into the language native variant of sequence.

There is a related question though: what about supporting arbitrary generic collection types. IMO this is doable simply by an elaboration of our current thinking for record types: with the addition of being able to plug-in adapters.

E.g., if we had a type:

CCTransaction

where CCTransaction is some type that models a credit card transaction, and Item is something that one might purchase with a credit card, then a. any actual transaction will have a type that grounds the Item type variable. (This does rule out some higher-order patterns) b. where Item shows up in the representation of a CCTransaction value, instead of a fixed lifting/lowering there would have to be code that lifted/lowered the actual transaction information.

On Wed, Sep 25, 2019 at 10:24 AM Jacob Gravelle notifications@github.com wrote:

Right, my point there is not that we need to add anything for non-sequential collections, but that the primitives we have are already enough for people to sequence their non-sequential data. We can (and should) continue to not care about them, but people will use them, and they will mostly work.

A related question is what are the essential properties of a sequence. Well-ordered is probably one. Isomorphic across representations could be one? i.e. for all X,Y where X is a type of sequence and Y is a type of sequence, for all n where n is a collection of values, then X(Y(X(n))) == X(n). For example let n = [1, 2, 2, 3], X = Vector, Y = List, then Vector(List(Vector([1, 2, 2, 3]))) == Vector([1, 2, 2, 3]), whereas if Y = Set then we would drop the duplicated 2 so the isomorphism fails to hold. I don't think we have any mechanism to disallow a user from mapping a set to a sequence. But we don't have to care about what happens in that case.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/interface-types/issues/72?email_source=notifications&email_token=AAQAXUHD53YFMF3VBQ5IEUDQLONFPA5CNFSM4IZWL2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SV7KQ#issuecomment-535125930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUGFMJVAWHOXE6LVLULQLONFPANCNFSM4IZWL2YA .

-- Francis McCabe SWE