WebAssembly / wasm-c-api

Wasm C API prototype
Apache License 2.0
534 stars 77 forks source link

Use vectors for calls and imports #145

Closed rossberg closed 4 years ago

rossberg commented 4 years ago

Return to using vectors for function call arguments and results, as well as module imports. This was simplified to a straight array interface in #48, but additional sanity checks have been requested.

Unlike #134, this avoids the use of struct-by-value in the C API and implements the API change consistently in both C and C++ API.

rossberg commented 4 years ago

@jakobkummerow, PTAL. I introduced a few convenience macros to make manual handling of vectors (and values) less verbose. Also, fixed latent crash bugs in examples where out val_vecs weren't properly initialised (which might cause attempts to release an own reference if a val happens to materialise with ref kind).

rossberg commented 4 years ago

It's not clear to me that passing struct containing a pointer and length, instead of just passing the pointer and length, is valuable. A plain pointer + length pair is simpler, more idiomatic in C APIs, and easier to bind to.

It's a question of consistency. The API uses the vec abstraction in all other places, so it would be rather incoherent to not use it in this one. It also is more self-documenting. I haven't tried, but I would expect that bindings benefit from consistency as well.

(You could argue that we shouldn't use vec anywhere except for return types, but I'm not sure that's preferable, especially on the C++ end. In any case, that would be a different and much larger change.)

rossberg commented 4 years ago

@jakobkummerow, PTAL.

alexcrichton commented 4 years ago

I think in terms of ownership the API right now is pretty inconsistent, and I think that @sunfishcode's suggestion could greatly improve the consistency and understandability of the API at-a-glance. For example I think it would be more understandable if vectors were only used as either return values or "here's ownership of an allocated chunk of memory". Currently vectors are sometimes used as ownership containers and sometimes used as "here's a temporary view into my data". Additionally the ownership itself is pretty:

Given this state the only real way to figure out the ownership of an API seems to be to look at the documentation, from the signature alone it's pretty unclear where ownership is being transferred. For example it seems like a weird gotcha that wasm_trap_new doesn't take ownership of its string argument while wasm_importtype_new does.

I think it could be more consistent to say "using a vector means transfer of ownership". That way if you ever see a *_vec_t (or alias with wasm_name_t or wasm_message_t) in the API you know that an ownership transfer is happening. Otherwise if you see a pointer/length then the standard C idiom of "don't manage the memory you don't own" kicks into play and it's clear you don't need to deallocate it, nor is it deallocated for you if you pass it to an API.

alexcrichton commented 4 years ago

I would expect that bindings benefit from consistency as well

FWIW I can say that at least from my experience consistency matters a lot with respect "what does everyone else in C do" just as much as within the API itself. For example to bind wasm_trap_new in Go you in theory just need to pass in a pointer/length, but due to pointer rules in Go I'm required to take a string argument, duplicate it as a CString (copy the memory), then pass that pointer to wasm_trap_new which likely copies the data again. All that really needs to happen here is passing a pointer/length, but requiring the API to go through a wasm_byte_vec_t adds an extra hoop to jump through which requires other allocations.

rossberg commented 4 years ago

I see what you mean. In most cases, where ownership transfer occurs was driven by what information an object needs to keep around, and what it is more likely to reconstruct from scratch. In some cases this may be biased towards what was most natural for the V8 backend.

However, "using a vector means transfer of ownership" may not be a sufficiently nuanced guiding principle, since you sometimes just want to return a pair of vector and length without passing ownership. That pairing was the primary purpose of the abstraction after all. Always passing ownership would either force premature copying, or extra levels of reference counting and indirection. The type representations are an example of this: e.g., just traversing the arguments of a function type should not require copying the vector and its types (which may get more complex in the future).

Ownership ought to be more apparent in the C++ version, where we at least have types to represent it. Though I believe it is still explicit from the own macro in the C version. That was the intent anyway. Can you elaborate where it is not?

Happy to discuss how to make this more consistent. Heads-up though that I'm off to vacation tomorrow, so probably won't reply during the next week.

alexcrichton commented 4 years ago

Ah yes to be clear the C API, as is today, is clear about where ownership is transferred. The type signatures (and the own token) are sufficient for this. Additionally I agree that higher level languages like C++ (and, for example, a Rust API) clearly define ownership through their types. My point, though, is that the ownership of arguments/results are pretty inconsistent throughout the C API. Not how they're textually defined but how they're semantically defined.

For example why does wasm_module_imports return an owned vector but wasm_functype_params returns a vector owned by the original type? Additionally why does wasm_importtype_new take ownership of its string arguments while wasm_trap_new not take ownership? (there's also a question of why wasm_trap_* works with nul-terminated strings but that's orthogonal to ownership)

I agree that APIs sometimes need to return two values, such as wasm_importtype_module. I think this can be done like in wasm_instance_new, however, where one return value is "blessed" to be returned from the function and another is returned as an out-parameter. My point is that idiomatically and conventionally I think it would make more sense for usage of *_vec_t to imply "ownership is being transferred", while if ownership isn't being transferred a plain old pointer/length is used (either as two parameters or one return and one out-parameter).

An example of this is the libgit2 API where git_buf is used similarly to wasm_byte_vec_t but only when a whole vector is returned. All other APIs that take a list of values take a plain pointer/length pair. This also shows an alternative to returning lists of values as returning iterators of values. For example in Wasmtime to implement wasm_module_exports we don't natively store exports in as a list of wasm_extern_t values, so an iterator would be much more natural there anyway.

rossberg commented 4 years ago

I created a issue #150 for the ownership discussion. Landing this patch for now.