Closed ptitSeb closed 1 year ago
Can this be added as point of discussion for the next WASI meeting @linclark @lukewagner?
We'd like to know which way is the proper one to make sure the Wasmer WASI implementation respect the specs, and right now is not clear on what the official way is
@syrusakbary please follow the procedure for adding a discussion topic (adding a PR to the agenda in the meetings repo)
wasi-libc has code for fd_prestat_dir_name and fd_readdir to insert trailing NUL's in the places where it needs them to be.
In languages other than C, strings aren't usually NUL-terminated, so their use of fd_readdir
doesn't need a trailing 0.
Consequently, I propose Wasmtime's current behavior in these two instances be considered the correct behavior.
Well, the terminated 0 needs to be set, either on wasm side or the wash side. In my opinion, the definition of "String" should be unified accross the API. At no point is the string defined, and because there isn't any hint of "sized-based" string definition (like could be found in Pascal langage), the assumption is that string are C-like 0-terminated.
It would be good to have clarification about all the string buffers.
@ptitSeb It might be good to have some context on what you're trying to do, as that will help understand what exact information you need.
You're right that strings aren't defined in WASI. That is by design. Instead of having a concrete definition of strings in WASI, there is an abstract string type, and that is defined in a different part of the WebAssembly standards, the component model.
As stated in the README, we are currently in the process of switching from the initial witx
to wit
, which is what the component model is defining.
WASI is transitioning away from the witx format and its early experimental ABI. We are transitioning to Interface Types using the wit format and the canonical ABI.
If you want to learn more about the thinking behind these types, you can read this post or dig into the component model repo.
I'm trying to maintain a wasi implementation.
But the spec are still not completely consistant. For example, the args_get
and environ_get
function does precise that string buffer returned are 0 terminated. But not for fd_prestat_dir_name
or fd_readdir
where it's not precised.
Similarily, the path_create_directory
function, the string is not defined as 0 terminated, and the function argument are in fact a string pointer and string length?
I dug a bit the component model, but the only thing I found bout string is "list of char" which is still not precise on how you delimit the end of the string (sized or 0-terminated).
in the mvp of the component model, in memory, a string is two i32
s: a pointer to the buffer and a length expressed as the number of code units (bytes for UTF-8 -- this is oversimplified slightly, see the code for details). I'd assume that means it is not nul-terminated, since the length is given explicitly.
see the definition of store_string
in https://github.com/WebAssembly/component-model/blob/b9be93e6311873ba8234e073203c9e27f2412c71/design/mvp/CanonicalABI.md#storing
Yeah, so basically, If I sum-up:
There are a few functions, like
fd_prestat_dir_name
orfd_readdir
that fill in buffer with string representing file names. The specs doesn't precised is the buffer include or not the trailling zero. What should it be? with or without the trailling 0?