What does "flush" mean?

badeend commented 9 months ago

What does "flush" mean? None of the WASI proposals currently define what it means to "flush" one of their output-streams.

wasi-io's documentation on flush says:

This tells the output-stream that the caller intends any buffered output to be flushed.

(Aside from the recursive definition :stuck_out_tongue_winking_eye:), which buffers, to what extent, and with what goal?

Does this include OS buffers? I.e.

for filesystems, does this mean just performing a write? Or write+fsync? (Or does it require the use of O_DIRECT to bypass Linux' internal caching entirely :hand_over_mouth:?)
for sockets, does this mean just performing a send? Or send with TCP_NODELAY enabled? If a write has not been flushed, does that mean we actually should've sent it with MSG_MORE in the first place?

Or put differently, why should a consumer of a random output-stream of which it doesn't know its origin, call flush (or one of its cousins)? What guarantee do they have after their data has been flushed? Is it now persistently stored on disk? Has it been sent out on the wire? Has the peer successfully received it?

As far as I can see in wasmtime, none of wasi-filesystem, wasi-sockets & wasi-http use flush for anything other than waiting for a previous write to finish.

Apologies for the many question marks :)

sunfishcode commented 9 months ago

Yes, "flush" here isn't about O_DIRECT or TCP_NODELAY or MSG_MORE or fsync or fdatasync or similar things at the host OS level.

Imagine a write that takes a list<u8> argument. When one component calls another (as opposed to calling the host, which can bypass all this), the callee's bindings will allocate memory for a buffer for the full length of the list<u8>, even if the callee ultimately encounters a short write for any reason. There was a concern that in that situation, the caller could end up having to pass another copy of bytes that it had already passed. To avoid this, callees now hold on to the list<u8> buffer, so that callers don't have to pass another copy of them, and a "flush" is added to instruct callees to finish writing out those bytes.

badeend commented 9 months ago

I understand how the distinct check_write & write methods eliminate unnecessary cross-component copies. But it's still not clear to me what role flush plays here. Purely looking at the current wasmtime implementations; a write either directly submits the data to the OS, or moves it to a background task that proactively tries to push it to the OS. I.e. writes are always propagated to the OS as fast as possible, regardless of flush being called or not. If we were to change wasi-libc/preview1-component-adapter to not issue flushes, how would the application behave differently?

That being said, I can see one place where flush is significant and that is right before closing a stream; to clear out all intermediate buffers.

"flush" here isn't about TCP_NODELAY or MSG_MORE

I realize it's all still pixie dust at this moment, but it seems that this is the direction Luke is thinking in: https://docs.google.com/presentation/d/1bWUO1Z9swQ4KSmoeUMTwIFYurasn14xFy4o-G8nE15w/edit#slide=id.g266ec2a0918_0_28

sunfishcode commented 9 months ago

I understand how the distinct check_write & write methods eliminate unnecessary cross-component copies. But it's still not clear to me what role flush plays here. Purely looking at the current wasmtime implementations; a write either directly submits the data to the OS, or moves it to a background task that proactively tries to push it to the OS. I.e. writes are always propagated to the OS as fast as possible, regardless of flush being called or not. If we were to change wasi-libc/preview1-component-adapter to not issue flushes, how would the application behave differently?

The check_write step is conceptually about not wanting to oblige the callee to allocate a larger list<u8> than the callee's instance can handle, and to allow the callee to exert a form of backpressure. That way, if check-write says you can write N bytes, then the write should accept a list<u8> with length N. That write returns N to say that N bytes have been accepted so that the callee doesn't send those bytes again.

However, it could happen that the OS write does a short write. If that happens, we're in a pickle: we told the caller we successfully wrote N bytes, but we didn't actually write them yet. Some callers might not care, but POSIX write and thus Preview 1 fd_write expect that if we say that N bytes were written, that they were indeed written with no errors. To implement this, the adapter issues a flush, so that if any errors occur, we detect them during the fd_write call, so that we never claim a failure is a success.

I can also add, this whole protocol of check-write, write, and flush is something I'm hoping we can simplify in the future.

That being said, I can see one place where flush is significant and that is right before closing a stream; to clear out all intermediate buffers.

"flush" here isn't about TCP_NODELAY or MSG_MORE

I realize it's all still pixie dust at this moment, but it seems that this is the direction Luke is thinking in: https://docs.google.com/presentation/d/1bWUO1Z9swQ4KSmoeUMTwIFYurasn14xFy4o-G8nE15w/edit#slide=id.g266ec2a0918_0_28

That's a different kind of "flush" :-). The current wasi-io flush is about "I want to know if there will be any errors reported for the bytes I just wrote". The stream flush in Luke's slides is about "I won't be sending more data for a while, so the callee and everything downstream should do whatever it's going to do with the data I've given it, now, rather than waiting for more".

badeend commented 9 months ago

That's a different kind of "flush" :-). POSIX write and thus Preview 1 fd_write expect that if we say that N bytes were written, that they were indeed written with no errors.

Alright, that at least clears up the confusion :)

So, flush is specifically designed for filesystems, and TCP sockets have no have no use for this interpretation of flush, right? I.e. wasi-libc never has to issue a flush for sockets, except just before closing.

sunfishcode commented 9 months ago

Socket writes can fail with ECONNRESET and possibly other things at the host OS, so it would seem they still need to be flushed at wasi-io to implement POSIX behavior.

badeend commented 9 months ago

Oh sorry, I wasn't trying to imply that sockets can't fail. But rather; the OS' non-blocking send implementation doesn't perform any IO itself. Most likely all it does is move the data from user space to a kernel queue for it to be put on the wire at a later moment. If send returns an error, it's likely that it has nothing to do with the current send call, but instead it is a delayed notification that one of the previous sends failed. Or put differently; if send returns success, it doesn't mean the data was successfully sent. It only means ownership of the buffer has been transferred from the application to the kernel.

Given that the socket data+failure pipeline is already fully asynchronous by nature (regardless of non-blocking mode of the fd), I don't see why libc needs to block every send on a flush. If an asynchronous error occurs, it will be returned by the next check-write.

Edit: I'm only speaking for internet sockets here. Don't know how e.g. UNIX sockets behave. Edit2: See the empty flush implementation for TcpSocket

badeend commented 9 months ago

BTW, if the answer is something along the lines of: "without having to know what kind of output-stream it is, consumers should follow the check-write, write & flush recipe", then I'm fine with that too. In that case TCP's output-stream will simply ignore flushes. I just want to have it clear what flush is supposed to do.

sunfishcode commented 9 months ago

Guest code doesn't know what kind of output-stream it has, even if it created the stream with accept or similar, because wasi-sockets could be virtualized by something that's using a different stream implementation. So if it needs POSIX-style write or send behavior, it should follow the check-write, write, flush protocol.

Wasmtime's tokio-based implementation is subtle enough that I'm not confident enough that I know what it's supposed to do.

WebAssembly / wasi-io

What does "flush" mean? #73