WebAssembly / WASI

WebAssembly System Interface
Other
4.93k stars 257 forks source link

Sharp edges of the current capability system #16

Open dumblob opened 5 years ago

dumblob commented 5 years ago

So far it seems the most advanced practical capability system is Capsicum. I think it would be nice to correlate their API with WASI API just to avoid some potential mistakes (see e.g. fine grained sub-filedescriptor capabilities, rough comparison of cap systems in 2010, and an example of Capsicum in tcpdump).

Also other capability systems shall be compared in detail with WASI for the same reasons.

In the end an overview of currently existing practical capability systems made of some high-level points from the thorough detailed comparison could help WASI in adoption.

sunfishcode commented 5 years ago

WASI has fine grained sub-filedescriptor capabilities ("rights"). It does not perform DAC or MAC (though it may run on platforms which perform DAC or MAC beneath it). The "tcpdump" strategy with cap_enter doesn't work as-is since there is no cap_enter, but it's possible to refine the rights on a capability, so it is possible to do some setup, drop most rights, and then run more code.

I agree, it'd be very useful to have more documentation on the capability model! Would you be interested in starting such a document?

dumblob commented 5 years ago

WASI has fine grained sub-filedescriptor capabilities ("rights").

Can an application A which obtained capability set X (of type __wasi_rights_t?) make a copy of the capability set X resulting in cap. set Y, then modify Y (only further constrain it) and then start/initiate application B and pass Y to it as its cap. set?

I agree, it'd be very useful to have more documentation on the capability model! Would you be interested in starting such a document?

I'm not feeling well versed in WASI yet (therefore my uneducated reactions and questions), so I'm afraid this wouldn't be a wise choice (also I'm UTC+1, which seems less than optimal for WASI :wink:).

sunfishcode commented 5 years ago

Can an application A which obtained capability set X (of type __wasi_rights_t?) make a copy of the capability set X resulting in cap. set Y, then modify Y (only further constrain it) and then start/initiate application B and pass Y to it as its cap. set?

Currently there is no way to copy a capability; it may be something we could add, though there are some subtle questions (does a copy of file capability share a file offset with the original? If so, that's maybe surprising and awkward; if not, we go against POSIX). Also there isn't an API for starting a new application yet. Other than those issues, yes, what you describe would work.

Also, if there's anything that we can change to make participation easier for people in other timezones, please let us know.

PoignardAzur commented 5 years ago

(does a copy of file capability share a file offset with the original? If so, that's maybe surprising and awkward; if not, we go against POSIX)

Actually, that's an interesting point, because it highlights some of the problems in POSIX's file abstraction, and subtle differences between its model and WASI's ocap model.

In POSIX, the file descriptor returned by an open call in read-only mode represents both an element in the file system, that can be passed to fstat, and has a fixed amount of data; and a one-time stream of bytes that is consumed when you read it.

C++ APIs tend to go around the duality by having two methods per I/O feature: one that takes a filename, and one that takes a std::istream object.

However, you can't pass filenames around in wasi, by design. So the following code:

void foobar(const std::string& filename) {
    int lineCount = getLineCount(filename);
    auto data = parseData(filename);
    // ...
}

might get replaced by:

void foobar(File file) {
    int lineCount = getLineCount(file);
    auto data = parseData(file);
    // ...
}

except the code above doesn't work, because file has already been "consumed" by getLineCount, so parseData gets an empty byte stream.

One solution is to place a lseek call before getLineCount, to save the cursor position, and another between getLineCount and parseData to reset it. However, that approach is error-prone, it's not atomic, exception-safe or thread-safe.

A better solution would be to allow functions to take file descriptors without mutating their contents and offsets, ideally in a way guaranteed by the type system. Of course, that wouldn't be possible in many cases (eg pipes and sockets), but it would be useful when dealing with a filesystem.

Another solution would be to differentiate a file capability from its contents:

    File myFile = open_at(myDir, "foobar");
    Stream myFileStream = get_stream(myFile);
    read(myFileStream, data);
void foobar(File file) {
    int lineCount = getLineCount(get_stream(file));
    auto data = parseData(get_stream(file));
    // ...
}
npmccallum commented 5 years ago

@PoignardAzur Why can't getLineCount(file) just dup(file) and not consume the cursor position?

PoignardAzur commented 5 years ago

Sure, that works, and it's basically equivalent to my use of get_stream above.

The standard would have to define semantics for what happens when dup is called on a pipe or a socket. Some possibilities:

The second one would allow a tightening of a file descriptor's capabilities, without impacting code using that file descriptor.

npmccallum commented 5 years ago

Duplicating listening stream sockets and datagram sockets is immensely valuable.

npmccallum commented 5 years ago

Also, expressed in libc terms, dup2(fd, STDIN_FILENO) with accepted stream sockets is immensely valuable in conjunction with fork() and exec(). WASI doesn't currently have process functions, but if we did I suspect the APIs would be valuable.

PoignardAzur commented 5 years ago

Ok, never mind the post above, then.

This does bring back to the problem @sunfishcode pointed out, which is that, using POSIX dup2, the solution you outlined (getLineCount(file) calls dup(file)) isn't enough to make getLineCount "pure".

PoignardAzur commented 5 years ago

So, after considering the problem, I think the use cases need to be hashed out.

Basically, we're considering two types of duplications: one that makes a shallow copy of a file descriptor, and one that makes a deep copy of it (from the point of view of virtual tables; from the IO point of view, they're both shallow).

Making a deep copy is useful for ocap purposes, but making a shallow copy is what POSIX does and has existing use cases (and is the only possible option for sockets and similar objects).

@npmccallum: would you say there are common/important use cases, that require making a shallow copy of a file descriptor (eg both copies share a file offset), except for the "copy to stdin/stdout, then fork/execve" use case?

Because if the only major use case is process forking, then I suggest the following implementation:

You'd also want the ability to borrow/move a non-copyable fd and constrain the rights given to it, in a way that doesn't affect the code passing that fd if it wants to do more with it when you're done.

aschrijver commented 1 year ago

In follow-up to my noob question on wasmcap I'll drop in references here to other innovative work wrt capability systems (copying from the comment text):

Btw, I would like to mention that Spritely Goblins is coming to Wasm world once they brought Wasm support for Guile. The Goblins technology isn't about Fediverse, and is quite far-reaching in a similar extent to how Wasm may innovate software development.

See The Heart of Spritely: Distributed Objects and Capability Security and Guile on WebAssembly project underway!