Returning arrays - Githubissues

jedisct1 commented 4 years ago

Returning arrays whose size is not fixed is currently a little bit complicated with WASI.

The caller typically needs to provide a buffer and its size. Then, among WASI functions and other hostcalls, a few different strategies are have been seen in the wild:

Fail, return a "buffer is too small" error, and expect the caller to retry with a larger buffer
Provide a function that returns the maximum possible size (not always easy to compute)
Return a handle with a read(2)-like interface
Truncate the output to the provided buffer size (can be very dangerous, especially with authentication tags)

For wasi-crypto, we have a mix of small, fixed-size, easy-to-compute return values, and things that can possibly be bigger.

For example, exporting a signature can result in something whose size is not easy to compute in advance, and that can be huge.

So, in my proposal, an output_array type has been introduced. The handle can be used to get the actual length of the array, and to copy its content. For huge outputs, a read(2)-like interface can be added on top of it later.

Is there a better short-term or long-term solution to return arrays whose size is not known in advance?

battila7 commented 4 years ago

Hi,

I think that another solution might be worth considering. What if fat array pointers were used, just as in the case of Cello?

Although this would be yet another strategy, I think it might fit this use case very well while being quite easy-to-use at the same time.

jedisct1 commented 4 years ago

Hi Attila,

I'm not sure I follow. Can you clarify how fat pointers would help in that context?

The main issue is that memory has to be allocated by the guest. It can be done the other way round, but not in a generic way.

programmerjake commented 4 years ago

Seems to me that the obvious solution is to somehow tell WASI where the guest's malloc implementation is so it can just call it as needed.

jedisct1 commented 4 years ago

@programmerjake Every module has its own way to allocate memory. A runtime cannot easily call the guest malloc implementation, or even assume that there is one.

As explained in the blog post above, for Terrarium, guests had to call a specific function at initialization time in order to tell the host how to allocate and free guest memory. This is only practical if you control both the runtime and the code that is going to run on it.

tniessen commented 4 years ago

I think the usual approach would be to fail if the buffer is too small. We need a way to report the number of bytes written anyway, and we can use that output value to indicate how many bytes would have been needed to write the entire output.

jedisct1 commented 4 years ago

Thanks, Tobias!

That sounds like the right thing to do, if only to be consistent with existing WASI functions such as readdir().

That means that we need these functions to return a handle, though.

We can't have e.g. the signature function directly write the output, or else, if the provided buffer is too small, the signature will need to be computed again on subsequent retries.

In the proposed API, signing returns a signature object, and getting that signature as a bytes string works as follows:

signature_export(signature, format) -> (errno, array_output)
the application can get the length before having to allocate anything: array_output_len(array_output) -> usize
and can then copy the content to a guest-allocated buffer: array_output_pull(array_output, buf, buf_len) -> errno, that will also fail if the buffer is too small, but this is unlikely to happen since the size was already known.

With an interface similar to readdir(), we would have a single function:

signature_export(signature, format, buf, buf_len) -> (errno, usize)

It can always be with buf_len set to 0, to get the actual length.

Pros:

Just one function
But used for two different things, can be confusing ("why are you calling it twice?" "well... the first time, it's to be able to call it correctly the second time)

Cons (taking that particular example):

signature_export encodes the signature to the given format. So, encoding has to be done twice, until some caching is done, which makes the implementation a little bit more complicated.

Returning arrays is something common enough that it should really be addressed at Interface Type level. Everything else is a little bit of a hack right now, but well... we still need something to work with.

jedisct1 commented 1 year ago

Not relevant any more, since the spec already includes that.

WebAssembly / wasi-crypto

Returning arrays #9