Open badeend opened 9 months ago
Yes, "flush" here isn't about O_DIRECT
or TCP_NODELAY
or MSG_MORE
or fsync
or fdatasync
or similar things at the host OS level.
Imagine a write
that takes a list<u8>
argument. When one component calls another (as opposed to calling the host, which can bypass all this), the callee's bindings will allocate memory for a buffer for the full length of the list<u8>
, even if the callee ultimately encounters a short write for any reason. There was a concern that in that situation, the caller could end up having to pass another copy of bytes that it had already passed. To avoid this, callees now hold on to the list<u8>
buffer, so that callers don't have to pass another copy of them, and a "flush" is added to instruct callees to finish writing out those bytes.
I understand how the distinct check_write
& write
methods eliminate unnecessary cross-component copies. But it's still not clear to me what role flush
plays here. Purely looking at the current wasmtime implementations; a write
either directly submits the data to the OS, or moves it to a background task that proactively tries to push it to the OS. I.e. writes are always propagated to the OS as fast as possible, regardless of flush
being called or not.
If we were to change wasi-libc/preview1-component-adapter to not issue flushes, how would the application behave differently?
That being said, I can see one place where flush
is significant and that is right before closing a stream; to clear out all intermediate buffers.
"flush" here isn't about TCP_NODELAY or MSG_MORE
I realize it's all still pixie dust at this moment, but it seems that this is the direction Luke is thinking in: https://docs.google.com/presentation/d/1bWUO1Z9swQ4KSmoeUMTwIFYurasn14xFy4o-G8nE15w/edit#slide=id.g266ec2a0918_0_28
I understand how the distinct
check_write
&write
methods eliminate unnecessary cross-component copies. But it's still not clear to me what roleflush
plays here. Purely looking at the current wasmtime implementations; awrite
either directly submits the data to the OS, or moves it to a background task that proactively tries to push it to the OS. I.e. writes are always propagated to the OS as fast as possible, regardless offlush
being called or not. If we were to change wasi-libc/preview1-component-adapter to not issue flushes, how would the application behave differently?
The check_write
step is conceptually about not wanting to oblige the callee to allocate a larger list<u8>
than the callee's instance can handle, and to allow the callee to exert a form of backpressure. That way, if check-write
says you can write N bytes, then the write
should accept a list<u8>
with length N. That write
returns N to say that N bytes have been accepted so that the callee doesn't send those bytes again.
However, it could happen that the OS write
does a short write. If that happens, we're in a pickle: we told the caller we successfully wrote N
bytes, but we didn't actually write them yet. Some callers might not care, but POSIX write
and thus Preview 1 fd_write
expect that if we say that N
bytes were written, that they were indeed written with no errors. To implement this, the adapter issues a flush, so that if any errors occur, we detect them during the fd_write
call, so that we never claim a failure is a success.
I can also add, this whole protocol of check-write
, write
, and flush
is something I'm hoping we can simplify in the future.
That being said, I can see one place where
flush
is significant and that is right before closing a stream; to clear out all intermediate buffers."flush" here isn't about TCP_NODELAY or MSG_MORE
I realize it's all still pixie dust at this moment, but it seems that this is the direction Luke is thinking in: https://docs.google.com/presentation/d/1bWUO1Z9swQ4KSmoeUMTwIFYurasn14xFy4o-G8nE15w/edit#slide=id.g266ec2a0918_0_28
That's a different kind of "flush" :-). The current wasi-io flush is about "I want to know if there will be any errors reported for the bytes I just wrote". The stream
flush in Luke's slides is about "I won't be sending more data for a while, so the callee and everything downstream should do whatever it's going to do with the data I've given it, now, rather than waiting for more".
That's a different kind of "flush" :-). POSIX write and thus Preview 1 fd_write expect that if we say that N bytes were written, that they were indeed written with no errors.
Alright, that at least clears up the confusion :)
So, flush
is specifically designed for filesystems, and TCP sockets have no have no use for this interpretation of flush, right? I.e. wasi-libc never has to issue a flush for sockets, except just before closing.
Socket writes can fail with ECONNRESET
and possibly other things at the host OS, so it would seem they still need to be flushed at wasi-io to implement POSIX behavior.
Oh sorry, I wasn't trying to imply that sockets can't fail.
But rather; the OS' non-blocking send
implementation doesn't perform any IO itself. Most likely all it does is move the data from user space to a kernel queue for it to be put on the wire at a later moment. If send
returns an error, it's likely that it has nothing to do with the current send
call, but instead it is a delayed notification that one of the previous send
s failed. Or put differently; if send
returns success, it doesn't mean the data was successfully sent. It only means ownership of the buffer has been transferred from the application to the kernel.
Given that the socket data+failure pipeline is already fully asynchronous by nature (regardless of non-blocking mode of the fd), I don't see why libc needs to block every send on a flush
. If an asynchronous error occurs, it will be returned by the next check-write
.
Edit: I'm only speaking for internet sockets here. Don't know how e.g. UNIX sockets behave. Edit2: See the empty flush implementation for TcpSocket
BTW, if the answer is something along the lines of: "without having to know what kind of output-stream it is, consumers should follow the check-write, write & flush recipe", then I'm fine with that too. In that case TCP's output-stream will simply ignore flushes. I just want to have it clear what flush is supposed to do.
Guest code doesn't know what kind of output-stream it has, even if it created the stream with accept
or similar, because wasi-sockets could be virtualized by something that's using a different stream implementation. So if it needs POSIX-style write
or send
behavior, it should follow the check-write
, write
, flush
protocol.
Wasmtime's tokio-based implementation is subtle enough that I'm not confident enough that I know what it's supposed to do.
What does "flush" mean? None of the WASI proposals currently define what it means to "flush" one of their output-streams.
wasi-io's documentation on
flush
says:(Aside from the recursive definition :stuck_out_tongue_winking_eye:), which buffers, to what extent, and with what goal?
Does this include OS buffers? I.e.
write
? Orwrite
+fsync
? (Or does it require the use ofO_DIRECT
to bypass Linux' internal caching entirely :hand_over_mouth:?)send
? Orsend
withTCP_NODELAY
enabled? If a write has not been flushed, does that mean we actually should've sent it withMSG_MORE
in the first place?Or put differently, why should a consumer of a random
output-stream
of which it doesn't know its origin, callflush
(or one of its cousins)? What guarantee do they have after their data has been flushed? Is it now persistently stored on disk? Has it been sent out on the wire? Has the peer successfully received it?As far as I can see in wasmtime, none of
wasi-filesystem
,wasi-sockets
&wasi-http
use flush for anything other than waiting for a previous write to finish.Apologies for the many question marks :)