WebAssembly / WASI

WebAssembly System Interface
Other
4.75k stars 243 forks source link

Integration of streams #486

Closed badeend closed 10 months ago

badeend commented 2 years ago

In POSIX, the resource handle is the stream. The stream is manipulated chunk-wise using read & write.

In the Component Model, resources and streams are separate. How do you envision stream s being incorporated into WASI API's?

The most POSIX-ish implementation I could come up with is something like the following, where every chunk is a separate stream:

resource my-fd-like {
    read: func(bytes-to-read: u32) -> expected<stream<u8, _>, ...>
    write: func(chunk: stream<...>) -> expected<_, ...>
}

But, conceptually the entire fd represents a single stream, so using a new stream per chunk feels like swimming against the "stream" to me.

Though, based on his presentation, it seems Luke does intend it to be used like this:

image


The programming languages that have built-in streams (that I'm familiar with) generally use a one-stream-per-file-descriptor approach:

resource my-fd-like {
    get-input-stream: func() -> stream<...> // readable stream
    get-output-stream: func() -> stream<...> // writeable stream

    // or:

    get-stream: func() -> stream<...> // readable+writable stream
}

But I currently don't see how to build a POSIX-compatible layer on top of this.


Blocks: https://github.com/bytecodealliance/wasmtime/issues/4276

sunfishcode commented 2 years ago

I've now updated the wasi-filesystem pwrite to use streams as presented in Luke's slides:

https://github.com/WebAssembly/wasi-filesystem/blob/main/wasi-filesystem.wit.md#pwrite

This version differs slightly from the presentation in that it includes an errno return type for indicating write errors.

My current understanding of on how file descriptors will work is that libc will present a "file descriptor" to users which is actually an index into a libc-managed array. Elements of this array will have an enum indicating what kind of I/O they represent, which would be one of the following:

For the last two, libc can manage the temporary streams internally and present POSIX-compatible stream abstraction to users.

badeend commented 2 years ago

In the updated filesystem API, does (p)read return a stream to the end of the file, rather than a single chunk? Also, does (p)write's future only resolve after the input stream has been completely consumed (or an error has occurred)?

sunfishcode commented 2 years ago

Yes, the current pread returns a stream to the end of the file. We could change that; we could have pread take a size, and then return a stream that stops when it reaches that size. I think we can make the user-facing libc pread API work on top of either form though, so I don't have a clear sense of which is best.

Also, yes, the current pwrite's future resolves after the stream has been closed, successfully or otherwise.

wmstack commented 2 years ago

Not sure if I add something meaningful, however, what do you think of Fuchsia (or Zircon's API) for streaming? Zircon, a microkernel (though not designed to be minimal) built by Google is a capability based kernel with no inherent notion of a filesystem, or a file in particular, but interacts with the rest of the system by capabilities called handles instead of the notion file descriptors. Almost all system calls from a userspace process use that handle, which is nothing more than a 32 bit integer identifying a capability.

Zircon is an object-based kernel. User mode code almost exclusively interacts with OS resources via object handles. A handle can be thought of as an active session with a specific OS subsystem scoped to a particular resource.

Source: https://fuchsia.dev/fuchsia-src/reference/kernel_objects/objects

The substitutes for files (or, shared memory) are a split of virtual memory objects and virtual memory address regions. Virtual memory objects are address space agnostic regions of memory that reside on the physical computer memory managed by the kernel, though which I am not sure if they are physically contiguous. However, in virtual address spacess those regions are made to feel contiguous. Those memory regions can be mapped into multiple different virtual address spaces regions.

A Virtual Memory Object (VMO) represents a contiguous region of virtual memory that may be mapped into multiple address spaces.

Source: https://fuchsia.dev/fuchsia-src/reference/kernel_objects/vm_object

VMARs are used by the kernel and userspace to represent the allocation of an address space.

Every process starts with a single VMAR (the root VMAR) that spans the entire address space (see zx_process_create()). Each VMAR can be logically divided up into any number of non-overlapping parts, each representing a child VMARs, a virtual memory mapping, or a gap. Child VMARs are created using zx_vmar_allocate(). VM mappings are created using zx_vmar_map()

Source: https://fuchsia.dev/fuchsia-src/reference/kernel_objects/vm_address_region

A stream can be created that maps onto a virtual memory object by the kernel, having an offset into the virtual memory object where it can read and write into buffers. This VMO could potentially be shared by multiple different processes. A stream is created by a system call, suppylying the handle to the virtual memory object, and hooks onto this underlying VMO, so that read and write operations to the stream can actually write to that virtual memory object's buffer.

Of course, there is an indirection between writing or reading from a VMO directly, versus writing and reading from a stream that is bound to that virtual memory object:

Unlike the read and write operations on a VMO, the read and write operations on a stream can be short, which the operations can complete successfully without filling (or, respectively, emptying) the supplied buffers. For example, a read that extends beyond the end of a VMO will simply fail whereas a read that extends beyond the end of a stream will succeed in reading to the end of the stream and partially filling the buffer.

Writes that extend beyond the end of the underlying storage attempt to increase the size of the underlying storage rather than failing immediately. For example, a write to a stream that extends beyond the end of the underlying VMO will attempt to resize the VMO rather than failing. If the resize operation fails on the underlying VMO, the write can end up being short.

Source: https://fuchsia.dev/fuchsia-src/reference/kernel_objects/stream

sunfishcode commented 2 years ago

In general, WASI's answers to many of these questions, such as the design of streams, are answered by the Wasm component model, so that would be a good place to follow up for more detailed discussion.

Sharing a bare virtual-memory object may simplify some optimizations, but it requires a great degree of coordination between components, to line up on synchronization, data representation, and data lifetimes, so it isn't a great fit for WASI's goals for isolation and cross-language interoperability. It also exposes a lot of information about how the host OS kernel works, which may be ok if the platform will always work the way Zircon does, but is less natural to do in all the environments that WASI is aiming to be useful in.

WASI's streams will be capable of zero-copy I/O with the host. Initially, component-to-component streams will require a copy. That said, there are some ideas in flight about ways to avoid the overhead of the copy in situations where it's significant.

badeend commented 1 year ago

In the filesystem API, does every invocation to read return a new stream from the beginning of the file? If so, what are the semantics when calling read multiple times in a row and/or concurrently? And what does that mean for file descriptors that don't support random access?

I've created a draft PR that integrates streams into wasi-sockets, where the same issue popped up. For now, I "solved" it by simply disallowing concurrent access to the streaming functions.

sunfishcode commented 1 year ago

I think what we want to do here is remove read and write from files, and just have pread and pwrite which create new streams at the given offset. For sockets, I think your draft PR looks good.

My current thinking is that file descriptors will be a libc-level concept, so libc will handle mapping random-access requests to the wasi-filesystem API or to other APIs as needed, and returning ESPIPE or so for attempts to do random-access on sockets.

badeend commented 1 year ago

For sockets, I think your draft PR looks good.

I've updated and published the PR. Now one can only call the streaming functions exactly once. This makes for a conceptually easier interface. Also, it allows to hook up the closing of a stream to the shutdown syscall.


I think what we want to do here is remove read and write from files, and just have pread and pwrite which create new streams at the given offset.

Seems reasonable.

Side note: The "p" prefix exists to differentiate it between the regular read/write calls. If you drop the "p"-less variants, there is no confusion left between the two, so then we could rename pread to read.

sascha1337 commented 1 year ago

Hopium confirmed

badeend commented 1 year ago

Hopium confirmed

I don't know what that means.