WebAssembly / WASI

WebAssembly System Interface
Other
4.75k stars 243 forks source link

Does Function that return buffer with string should include trailling 0 or not #492

Closed ptitSeb closed 1 year ago

ptitSeb commented 1 year ago

There are a few functions, like fd_prestat_dir_name or fd_readdir that fill in buffer with string representing file names. The specs doesn't precised is the buffer include or not the trailling zero. What should it be? with or without the trailling 0?

syrusakbary commented 1 year ago

Can this be added as point of discussion for the next WASI meeting @linclark @lukewagner?

We'd like to know which way is the proper one to make sure the Wasmer WASI implementation respect the specs, and right now is not clear on what the official way is

linclark commented 1 year ago

@syrusakbary please follow the procedure for adding a discussion topic (adding a PR to the agenda in the meetings repo)

syrusakbary commented 1 year ago

Here we go: https://github.com/WebAssembly/meetings/pull/1094

sunfishcode commented 1 year ago

wasi-libc has code for fd_prestat_dir_name and fd_readdir to insert trailing NUL's in the places where it needs them to be.

In languages other than C, strings aren't usually NUL-terminated, so their use of fd_readdir doesn't need a trailing 0.

Consequently, I propose Wasmtime's current behavior in these two instances be considered the correct behavior.

ptitSeb commented 1 year ago

Well, the terminated 0 needs to be set, either on wasm side or the wash side. In my opinion, the definition of "String" should be unified accross the API. At no point is the string defined, and because there isn't any hint of "sized-based" string definition (like could be found in Pascal langage), the assumption is that string are C-like 0-terminated.

It would be good to have clarification about all the string buffers.

linclark commented 1 year ago

@ptitSeb It might be good to have some context on what you're trying to do, as that will help understand what exact information you need.

You're right that strings aren't defined in WASI. That is by design. Instead of having a concrete definition of strings in WASI, there is an abstract string type, and that is defined in a different part of the WebAssembly standards, the component model.

As stated in the README, we are currently in the process of switching from the initial witx to wit, which is what the component model is defining.

WASI is transitioning away from the witx format and its early experimental ABI. We are transitioning to Interface Types using the wit format and the canonical ABI.

If you want to learn more about the thinking behind these types, you can read this post or dig into the component model repo.

ptitSeb commented 1 year ago

I'm trying to maintain a wasi implementation.

But the spec are still not completely consistant. For example, the args_get and environ_get function does precise that string buffer returned are 0 terminated. But not for fd_prestat_dir_name or fd_readdir where it's not precised. Similarily, the path_create_directory function, the string is not defined as 0 terminated, and the function argument are in fact a string pointer and string length?

I dug a bit the component model, but the only thing I found bout string is "list of char" which is still not precise on how you delimit the end of the string (sized or 0-terminated).

programmerjake commented 1 year ago

in the mvp of the component model, in memory, a string is two i32s: a pointer to the buffer and a length expressed as the number of code units (bytes for UTF-8 -- this is oversimplified slightly, see the code for details). I'd assume that means it is not nul-terminated, since the length is given explicitly.

see the definition of store_string in https://github.com/WebAssembly/component-model/blob/b9be93e6311873ba8234e073203c9e27f2412c71/design/mvp/CanonicalABI.md#storing

ptitSeb commented 1 year ago

Yeah, so basically, If I sum-up: