Open dumblob opened 5 years ago
WASI has fine grained sub-filedescriptor capabilities ("rights"). It does not perform DAC or MAC (though it may run on platforms which perform DAC or MAC beneath it). The "tcpdump" strategy with cap_enter
doesn't work as-is since there is no cap_enter
, but it's possible to refine the rights on a capability, so it is possible to do some setup, drop most rights, and then run more code.
I agree, it'd be very useful to have more documentation on the capability model! Would you be interested in starting such a document?
WASI has fine grained sub-filedescriptor capabilities ("rights").
Can an application A which obtained capability set X (of type __wasi_rights_t
?) make a copy of the capability set X resulting in cap. set Y, then modify Y (only further constrain it) and then start/initiate application B and pass Y to it as its cap. set?
I agree, it'd be very useful to have more documentation on the capability model! Would you be interested in starting such a document?
I'm not feeling well versed in WASI yet (therefore my uneducated reactions and questions), so I'm afraid this wouldn't be a wise choice (also I'm UTC+1, which seems less than optimal for WASI :wink:).
Can an application A which obtained capability set X (of type __wasi_rights_t?) make a copy of the capability set X resulting in cap. set Y, then modify Y (only further constrain it) and then start/initiate application B and pass Y to it as its cap. set?
Currently there is no way to copy a capability; it may be something we could add, though there are some subtle questions (does a copy of file capability share a file offset with the original? If so, that's maybe surprising and awkward; if not, we go against POSIX). Also there isn't an API for starting a new application yet. Other than those issues, yes, what you describe would work.
Also, if there's anything that we can change to make participation easier for people in other timezones, please let us know.
(does a copy of file capability share a file offset with the original? If so, that's maybe surprising and awkward; if not, we go against POSIX)
Actually, that's an interesting point, because it highlights some of the problems in POSIX's file abstraction, and subtle differences between its model and WASI's ocap model.
In POSIX, the file descriptor returned by an open
call in read-only mode represents both an element in the file system, that can be passed to fstat, and has a fixed amount of data; and a one-time stream of bytes that is consumed when you read it.
C++ APIs tend to go around the duality by having two methods per I/O feature: one that takes a filename, and one that takes a std::istream object.
However, you can't pass filenames around in wasi, by design. So the following code:
void foobar(const std::string& filename) {
int lineCount = getLineCount(filename);
auto data = parseData(filename);
// ...
}
might get replaced by:
void foobar(File file) {
int lineCount = getLineCount(file);
auto data = parseData(file);
// ...
}
except the code above doesn't work, because file
has already been "consumed" by getLineCount
, so parseData
gets an empty byte stream.
One solution is to place a lseek
call before getLineCount
, to save the cursor position, and another between getLineCount
and parseData
to reset it. However, that approach is error-prone, it's not atomic, exception-safe or thread-safe.
A better solution would be to allow functions to take file descriptors without mutating their contents and offsets, ideally in a way guaranteed by the type system. Of course, that wouldn't be possible in many cases (eg pipes and sockets), but it would be useful when dealing with a filesystem.
Another solution would be to differentiate a file capability from its contents:
File myFile = open_at(myDir, "foobar");
Stream myFileStream = get_stream(myFile);
read(myFileStream, data);
void foobar(File file) {
int lineCount = getLineCount(get_stream(file));
auto data = parseData(get_stream(file));
// ...
}
@PoignardAzur Why can't getLineCount(file)
just dup(file)
and not consume the cursor position?
Sure, that works, and it's basically equivalent to my use of get_stream above.
The standard would have to define semantics for what happens when dup
is called on a pipe or a socket. Some possibilities:
The second one would allow a tightening of a file descriptor's capabilities, without impacting code using that file descriptor.
Duplicating listening stream sockets and datagram sockets is immensely valuable.
Also, expressed in libc terms, dup2(fd, STDIN_FILENO)
with accepted stream sockets is immensely valuable in conjunction with fork()
and exec()
. WASI doesn't currently have process functions, but if we did I suspect the APIs would be valuable.
Ok, never mind the post above, then.
This does bring back to the problem @sunfishcode pointed out, which is that, using POSIX dup2
, the solution you outlined (getLineCount(file)
calls dup(file)
) isn't enough to make getLineCount
"pure".
So, after considering the problem, I think the use cases need to be hashed out.
Basically, we're considering two types of duplications: one that makes a shallow copy of a file descriptor, and one that makes a deep copy of it (from the point of view of virtual tables; from the IO point of view, they're both shallow).
Making a deep copy is useful for ocap purposes, but making a shallow copy is what POSIX does and has existing use cases (and is the only possible option for sockets and similar objects).
@npmccallum: would you say there are common/important use cases, that require making a shallow copy of a file descriptor (eg both copies share a file offset), except for the "copy to stdin/stdout, then fork/execve" use case?
Because if the only major use case is process forking, then I suggest the following implementation:
Add a duplicate
syscall. Its name is explicitly different from POSIX's naming convention.
Add a CAN_BE_DUPLICATED capability, that only files have for now.
When it comes time to implementing fork/execve, add a way to pass stdin / stdout / arbitrary file descriptors without duplicating them.
You'd also want the ability to borrow/move a non-copyable fd and constrain the rights given to it, in a way that doesn't affect the code passing that fd if it wants to do more with it when you're done.
In follow-up to my noob question on wasmcap I'll drop in references here to other innovative work wrt capability systems (copying from the comment text):
Btw, I would like to mention that Spritely Goblins is coming to Wasm world once they brought Wasm support for Guile. The Goblins technology isn't about Fediverse, and is quite far-reaching in a similar extent to how Wasm may innovate software development.
See The Heart of Spritely: Distributed Objects and Capability Security and Guile on WebAssembly project underway!
So far it seems the most advanced practical capability system is Capsicum. I think it would be nice to correlate their API with WASI API just to avoid some potential mistakes (see e.g. fine grained sub-filedescriptor capabilities, rough comparison of cap systems in 2010, and an example of Capsicum in tcpdump).
Also other capability systems shall be compared in detail with WASI for the same reasons.
In the end an overview of currently existing practical capability systems made of some high-level points from the thorough detailed comparison could help WASI in adoption.