WebAssembly / wasi-filesystem

Filesystem API for WASI
Other
179 stars 19 forks source link

Remove paths #112

Closed SoniEx2 closed 1 year ago

SoniEx2 commented 1 year ago

Since we have openat, there's no reason to have paths. Paths are known to introduce major vulnerabilities:

  1. Accidentally allowing subdirectory access when treating arbitrary strings as a file name.
  2. Accidentally allowing parent directory access due to improper handling of .. elements in paths.

Further, paths can be emulated in userspace if really needed, including the semantics of . and ... There is no good reason to have paths in a modern, low-level filesystem API, and they should be considered a legacy technology instead.

sunfishcode commented 1 year ago

It's an interesting idea, since the generic way to sandbox paths is to open them open path component at a time, and we could view this as simply moving that code out of the host and into wasm. And, there's probably a lot of variation in how this is currently implemented in engines.

However, paths allow implementations to use optimized implementations such as Linux's openat2 with the RESOLVE_BENEATH flag rather than opening one path component at a time, allowing it to open complex paths with a single system call instead of a system call per path component including symlink expansions.

SoniEx2 commented 1 year ago

It would also bring support for filesystems that don't use a "path DSL" but instead treat paths as true arrays of edges.

If chained opening of one path component at a time is too popular, we can also add an API that allows chained opening of one path component at a time. We'd still discourage the engine from building up a chained opening script (path DSL) from the path components, since that would bring back all of the issues avoided by moving away from scripting your openats, but it could do that to avoid the extra syscalls.

sunfishcode commented 1 year ago

I don't think supporting exotic filesystems that don't use conventional path strings is enough of a motivation to make the system less optimal or more complex for the popular use cases.

SoniEx2 commented 1 year ago

what about System 7 support?

sunfishcode commented 1 year ago

Do you mean Mac OS System 7, the last release of which was 26 years ago? I think it's safe to call that exotic in this context. I'm not opposed to supporting it in principle, but if it would require wasi-filesystem to be significantly more complex for the platforms that the vast majority of users today are using, I don't think that tradeoff is worthwhile.

sunfishcode commented 1 year ago

Path strings are what the vast majority of platforms in use today use, and it's a goal of wasi-filesystem to support existing filesystems. Changing this is theoretically interesting, but with the filesystems in use today, it would add overhead, preclude some optimizations, increase code size, and not bring any advantages for the vast majority of expected use cases.

badeend commented 1 year ago

In principle I like @SoniEx2 suggestion, so I just wanted to add third option for the records; keep paths in the interface, but represent them as a list of path segments, instead of a single / separated string. This would tackle both the OP's concerns while also allowing RESOLVE_BENEATH-like syscall optimizations. In exchange for more heap allocations.

Having said that, I agree with @ sunfishcode's previous comment. So no need to reopen this issue.

SoniEx2 commented 1 year ago

how much of a bottleneck is it exactly? as in ppl thought rust's bounds checking was gonna be a bottleneck and rust proved them wrong.

we still believe an API that is harder for implementers to get wrong is better than an API that has giant warnings about how to implement it correctly.

sunfishcode commented 1 year ago

It can be as much as the difference between making one host system call per open and one host system call per path component per open. If you believe the situation might be analogous to array bounds checking in Rust, I encourage you to prototype something and measure the overhead.

An additional consideration is the complexity in guest toolchains of supporting existing code which uses paths.

SoniEx2 commented 1 year ago

we don't mind taking the path as an array of path components. if you're gonna be interpolating path components (strings) into a path (for passing onto host API with RESOLVE_BENEATH), hopefully that's enough to make the implementer think about escaping/filtering.

the complexity in guest toolchains is definitely not insignificant tho.

it's kinda unfortunate that the filesystem is so ubiquitous across modern systems that you can't even begin to consider making it better. and we don't even mean replacing it or getting rid of it.