WebAssembly / wasi-filesystem

Filesystem API for WASI
Other
179 stars 19 forks source link

Drop explicit preopens #128

Open badeend opened 1 year ago

badeend commented 1 year ago

Given WASI's CloudABI heritage I understand why preopens currently are the way that they are.

But they feel like something that should be just an host implementation detail. Ie. could the following also work?:

Remove

get-directories: func() -> list<tuple<descriptor, string>>

in exchange for:

instance-root-directory: func() -> descriptor

which returns a single directory that represents the "default filesystem". In similar vein to:

By default this is an empty directory. But the host can "mount" specific external directories/files into it. Exactly how the contents of this "filesystem" are populated (preopened or not), would be of concern only to the host, not the client application / libc.

pchickey commented 1 year ago

I'm up for exploring this idea more sometime in the future, but for the preview2 timeframe I have advocated that we keep interfaces and concepts that existed in preview 1 as similar as we can possibly get away with.

BentonEdmondson commented 1 year ago

I think this is a good idea. This would solve https://github.com/WebAssembly/wasi-libc/issues/414

badeend commented 1 year ago

for the preview2 timeframe I have advocated that we keep interfaces and concepts that existed in preview 1 as similar as we can possibly get away with.

Yeah, that's totally fine!

The main reason why I am asking this is because I'm trying to get a grip on where WASI wants to end up regarding "preopens", so I can take that into account for wasi-sockets.

sunfishcode commented 9 months ago

As the person who initially brought preopens to WASI, based on similar ideas in CloudABI and Capsicum, I agree. I've also come to believe that migrating wasi-filesystem to have a rooted filesystem view would be beneficial to WASI. This would be from many people's perspective a much more "normal" filesystem API. We could have an open function where you can just pass it a string, and open a file. We could even add a builtin current working directory. Overall, this would eliminate a common source of friction when people look to port code to WASI.

This new namespace could still be expected to be sandboxed, but the sandbox could look more like a simple VFS with things mounted in it. Many use cases could just use that directly, rather than using preopens. I expect the VFS could even be populated by the existing --dir command-line flags. Overall, it seems like this is something we can migrate to incrementally, without breaking compatibility with existing code.

Why the change in stance? Preopens were added in the Preview 1 era, at a time when there wasn't even a component model proposal, and we had a lot of questions with no answers. But today:

(But what if you want an Everything Is A File experience? We can provide it, by teaching WASI-Virt how to implement features in terms of wasi-filesystem APIs!)

So, should we start moving in this direction? I don't see any objections to this overall issue so far, but I'll do some more asking around.

yamt commented 8 months ago

This new namespace could still be expected to be sandboxed, but the sandbox could look more like a simple VFS with things mounted in it.

i concern the "simple VFS" is likely a bit more expensive than preopens, which is just a list.

yoshuawuyts commented 4 months ago

Dan published a blog post about this today: Capabilities and Filesystems. In it he says the following about wasi:filesystem:

We could add a plain open function, which takes a path argument and resolves it relative to a filesystem root that comes with the instance that the function is imported from. That makes this open function use a link-time capability. It should support absolute paths, symlinks to absolute paths, and all the rest. And then, we could add similar non--at versions of all the -at functions. This new API could live in a new interface that could coexist with the current interface.

To me this seems like something worth experimenting with now that we've shipped Preview 2.

badeend commented 3 months ago

I've created a proof-of-concept of this over at: https://github.com/bytecodealliance/wasmtime/compare/main...badeend:root-directory


We could add a plain open function, which takes a path argument and resolves it relative to a filesystem root that comes with the instance that the function is imported from. That makes this open function use a link-time capability. It should support absolute paths, symlinks to absolute paths, and all the rest. And then, we could add similar non--at versions of all the -at functions. This new API could live in a new interface that could coexist with the current interface.

I went with a simpler design, similar to the one outlined in the initial post. Instead of duplicating the entire path-based API-surface into non-_at versions, the POC only has one new WIT method:

  package wasi:filesystem@0.2.0;
  interface preopens {
+     @deprecated
      get-directories: func() -> list<tuple<descriptor, string>>;

+     /// Get the root directory of the file system.
+     /// In POSIX filesystems this is located at: `/`
+     @unstable
+     root-directory: func() -> descriptor;
  }

This feels like a sweet spot to me:


To test it all out I disabled the preview1-style preopens and every unit test still passes, except preview2_file_read_write. This is quite obvious when you look at its source code:

let preopens = wasi::filesystem::preopens::get_directories();
let (dir, _) = &preopens[0]; // <--- panics here

All other tests continue work because they go through the preview1-to-preview2 adapters, which I also modified:

On initialization, the adapter calls root-directory() and appends the fd to its internal preopens list with a hardcoded path of: / Next, it attempts to get the initial-cwd and open that using root_directory.open_at(initial_cwd). If that succeeds, the fd is likewise appended to the preopens at path: .

This was enough to make the preview1 test suite pass without modification. In terms of upgrade path; a future preview2-to-preview3 adapter could take a similar approach and return exactly 1 or 2 paths from its synthesized preopens::get-directories(). One for /, and optionally another one for .

Note: eagerly preopening initial_cwd as . was the easiest way to get the adapter to work. Don't think this is a fundamental requirement, though.


Some notes about the POC implementation:

badeend commented 3 months ago

The POC also fixes https://github.com/WebAssembly/wasi-libc/issues/414, btw

yoshuawuyts commented 3 months ago

@badeend thanks for doing this work! - Would you be up for presenting these changes at either the WASI sub-group or the component implementers meeting? It doesn't have to be anything big - but more to get all implementers on the same page about this.

I went with a simpler design, similar to the one outlined in the initial post. Instead of duplicating the entire path-based API-surface into non-_at versions, the POC only has one new WIT method

Oh also, I was curious about this: for 0.3, do you think we should keep the at_ APIs? Or is this change specific to 0.2.x, and we should actually remove the at_ APIs in 0.3?

badeend commented 3 months ago

do you think we should keep the at APIs? Or is this change specific to 0.2.x, and we should actually remove the at APIs in 0.3?

Even for 0.3+, I think there's still merit in keeping them. Two reasons I can think of right now:

  1. Paths passed to the *_at methods are securely confined to the reference directory descriptor. This on its own is already a nice guarantee to build software on. Additionally, this simplifies the composition of components with increasingly tightened sandboxes; A parent component can open-at a specific subdirectory and expose that descriptor as-is, without further virtualization, to a child component as their root-directory.
  2. If we were to remove all *_at API's and replace them with only ambient authority equivalents, this would significantly increase the complexity of the preview1 adapters as they would need to rebuild the security model that exists today.
sunfishcode commented 2 months ago

A subtlety here is that the current *-at functions perform this sandboxing where they prevent ".." and symlinks from escaping the given directory. In contrast, POSIX openat doesn't do this sandboxing.

badeend commented 2 months ago

That's correct. The current WASI open-at is most similar to Linux' openat2 with RESOLVE_BENEATH.

To get POSIX openat behavior with my POC, is for wasi-libc to prepend the path of the reference directory to the input path and use that to call open-at on the root directory, correct?

sunfishcode commented 2 months ago

In a POSIX sense, prepending the path isn't sufficient, because according to POSIX, the main purpose of openat is to avoid race conditions:

The purpose of the openat() function is to enable opening files in directories other than the current working directory without exposure to race conditions. Any part of the path of a file could be changed in parallel to a call to open(), resulting in unspecified behavior. By opening a file descriptor for the target directory and using the openat() function it can be guaranteed that the opened file is located relative to the desired directory.

badeend commented 2 months ago

Ah right. The devil is always in the details. Do you have any ideas on how to go about this in such a way that it can be implemented efficiently but also doesn't require exposing the entire host filesystem?

bjorn3 commented 2 months ago

Do we actually need to support openat without RESOLVE_BENEATH?

sunfishcode commented 2 months ago

Do we actually need to support openat without RESOLVE_BENEATH?

That's a good question. I've been thinking part of the goal here is to have as "normal" a filesystem API as we can get, which would include having an openat that works like POSIX's does. But I could also see the argument that this particular detail of it isn't as important as keeping the overall API surface simpler.