Open vapier opened 2 years ago
Thanks for the thorough analysis! I'm curious what led you to notice this; do you have a use case with many fd_set
instances, such the greater size is significant, or are you optimizing a use case where an extra 3972 bytes is significant?
In WASI, there are more open file descriptors than a typical POSIX program would expect there to be. Beyond stdin, stdout, and stderr, WASI passes in preopened directory fds, and in the future, it will likely pass in additional file descriptors. A program which uses only 1024 file descriptors on typical POSIX platforms could use more on WASI, so the current fd_set
design maximizes compatibility.
i noticed this issue while debugging a program that was mishandling fd_set
(assuming it was implemented using a bitmask) and causing memory corruption when using WASI libc. that was a bug in the program, so i fixed it there. ironically if WASI libc used a bitmask like other C libraries, i never would have noticed.
for background, i've implemented a WASI runtime, so i'm a bit familiar with the expectations/requirements of the WASI libc runtime. i know that you need to pass in open directory handles, including the cwd/.
, before the program launches. but i have a hard time seeing how that translates to using up the majority of the 1024 space. it isn't a sparse space -- the WASI libc runtime walks from fd 0+ until it hits EBADF
. even if the initial environment was given a couple of directories to play in, and then some file handles (to paths outside of those directories), that's still an incredible number before exhausting.
but let's assume such a pathological setup is reasonable. i'll point out again that the behavior you describe is not POSIX compliant -- it specifically says what cloudlibc is doing is undefined behavior, not implementation-defined behavior. that means compilers are allowed to turn this into an abort or memory corruption or anything else. anyone writing to this API is asking for trouble, and it's their fault when it happens.
kind of the point of WASI libc is to enable POSIX compliant applications with minimal to no modification. POSIX already provides APIs (e.g. poll()) specifically to handle arbitrary fd limits. those would work right now.
i grok that maybe WASI libc probably wants to be a little nicer to folks out of the box and allow them to use POSIX APIs they're used to without having to learn something new (ignoring the fact that WASI/WASM are very new :p). that is still reasonably achievable without bloating things so much. a bitmask scales by 8, so a bitmask implementation with FD_SETSIZE
set to 4096
would still take up only 512
bytes.
i'll point out again that the behavior you describe is not POSIX compliant -- it specifically says what cloudlibc is doing is undefined behavior, not implementation-defined behavior. that means compilers are allowed to turn this into an abort or memory corruption or anything else. anyone writing to this API is asking for trouble, and it's their fault when it happens.
Could you clarify this? I'm not seeing from your description how cloudlibc has undefined behavior here.
Bumping FD_SETSIZE
to 4096 would very likely be more than enough for any application that doesn't look at the value of FD_SETSIZE
. But consider an application that does lots of filesystem accesses and optimizes them by caching file descriptors. It might assume it can cache up to FD_SETSIZE - 3
file descriptors, where the 3 comes from stdin, stdout, and stderr. In WASI, there may be more than 3 file descriptors allocated by the system. As a result, such an application wouldn't work with a bitmask fd_set
, without being taught about WASI specifically.
Admittedly, I don't know of specific applications that do this. But I also don't perceive fd_set
's size as a significant problem.
In an application that only opens 16 or fewer file descriptors, a pollfd
array will use less memory than even a bitmask fd_set
with FD_SETSIZE
set to 1024. Applications which open more file descriptors than that will typically be doing enough other things that using an extra 4000 bytes of memory isn't that significant. And many of those applications will use poll
anyway (and eventually, epoll
, or whatever API we end up providing).
Could you clarify this? I'm not seeing from your description how cloudlibc has undefined behavior here.
i quoted it in my original report/description:
- select()
The behavior of these macros is undefined if the fd argument is less than 0 or greater than or equal to FD_SETSIZE, or if fd is not a valid file descriptor, or if any of the arguments are expressions with side-effects.
i grok that, from cloudlibc's pov, it's trying to make an API guarantee, but from POSIX's pov, this is clearly undefined behavior, and compilers are free to optimize it as such. i'm not aware of any currently doing so, but when you tell people "X behavior is classified as undefined behavior by the standards authors", that puts it in a category of "do not touch it".
Bumping
FD_SETSIZE
to 4096 would very likely be more than enough for any application that doesn't look at the value ofFD_SETSIZE
. But consider an application that does lots of filesystem accesses and optimizes them by caching file descriptors. It might assume it can cache up toFD_SETSIZE - 3
file descriptors, where the 3 comes from stdin, stdout, and stderr. In WASI, there may be more than 3 file descriptors allocated by the system. As a result, such an application wouldn't work with a bitmaskfd_set
, without being taught about WASI specifically.
such an application is clearly not POSIX compliant, and is thus WASI-libc-specific by definition, and hoping that the runtime does not break on it in the future.
further, such an application that assumes the range [0,2] is allocated and [3, infinity) is free is buggy. POSIX guarantees that 0/1/2 are allocated, but that it is. It, nor any other runtime, makes any guarantees about any other fd. if it did, then the WASI runtime would be in a bit of a pickle because it wouldn't be allowed to pass in pre-opened paths at all.
in the *NIX world, it is trivial to leak open fds, accidentally or on purpose, when running other programs. it's why Linux invented O_CLOEXEC, why the historical BSD/POSIX daemon interface has a noclose
option, and why Linux has invented close_range().
such an application should be using poll()
instead.
Admittedly, I don't know of specific applications that do this. But I also don't perceive
fd_set
's size as a significant problem.
fd_set
's, in my limited experience, are often declared on the stack. that's a 4k-hit per set, and select()
takes 2 (read & write). so you're looking at an 8k stack frame per call (of the function declaring the sets of course, not of select()
). if you want to have independent groups (e.g. one for files, one for networks, etc...), it only multiplies. that's a pretty harsh penalty for functionality that doesn't have a demanding use case, can be easily addressed with other existing APIs, and that the rest of the *NIX world has lived for decades without.
Could you clarify this? I'm not seeing from your description how cloudlibc has undefined behavior here.
i quoted it in my original report/description:
- select()
The behavior of these macros is undefined if the fd argument is less than 0 or greater than or equal to FD_SETSIZE, or if fd is not a valid file descriptor, or if any of the arguments are expressions with side-effects.
i grok that, from cloudlibc's pov, it's trying to make an API guarantee, but from POSIX's pov, this is clearly undefined behavior, and compilers are free to optimize it as such. i'm not aware of any currently doing so, but when you tell people "X behavior is classified as undefined behavior by the standards authors", that puts it in a category of "do not touch it".
WASI does not intend to define itself as a strictly POSIX platform. WASI follows POSIX in many areas, and will likely always do so, as this is very useful, but there's nothing wrong with WASI saying that some things that are UB in POSIX have defined behavior in WASI. And when we say this, compilers shouldn't assume that they're UB. And as one of the maintainers of the compiler used with wasi-libc, I believe this is realistic.
Admittedly, I don't know of specific applications that do this. But I also don't perceive
fd_set
's size as a significant problem.
fd_set
's, in my limited experience, are often declared on the stack. that's a 4k-hit per set, andselect()
takes 2 (read & write). so you're looking at an 8k stack frame per call (of the function declaring the sets of course, not ofselect()
). if you want to have independent groups (e.g. one for files, one for networks, etc...), it only multiplies. that's a pretty harsh penalty for functionality that doesn't have a demanding use case, can be easily addressed with other existing APIs, and that the rest of the *NIX world has lived for decades without.
I don't think we know that the rest of the *NIX world has lived without it. The point of the example I brought up is that it's something an application could theoretically do and be ok on POSIX, which wouldn't work on WASI.
Beyond that example though, wasi-libc's fd_set
implementation also gives us flexibility in the fd index space. As we move toward doing more things with file descriptors, it's possible we may want to make the file-descriptor space non-contiguous, or we may want to reserve some ranges for special purposes. Currently, fd_set
and FD_SETSIZE
do not give us reasons not to do these things, which is a nice property.
If there are real-world use cases where the extra fd_set
size causes a problem, and either running out of stack space, or consuming a noticeable excess of memory, and porting it to poll
isn't feasible or doesn't help for some reason, it would make sense to take a closer look at the options. But until then, I expect we'll want all the flexibility we can get.
cloudlibc seems to have made a design choice that POSIX does not require, and what most *NIX implementations don't support. this has led to cloudlibc using a lot more memory supporting scenarios with
select()
that is better served bypoll()
. imo this should be revisited by basically deleting the cloudlibc logic entirely.first, let's start with what POSIX says:
let's look at what cloudlibc does:
FD_SETSIZE
set to1024
int __fds[FD_SETSIZE];
-- an array of integers (fds)__fds
is just "the next used fd"__fds
is the fd to check__fds
array takes exactly__nfds
checks (good)sizeof(fd_set)
is 4100 bytes (bad)FD_SETSIZE
(fine)select()
, not just[0, FD_SETSIZE)
(good)let's look at what musl (and basically every other *NIX library out there) does:
FD_SETSIZE
set to 1024unsigned long fds_bits[FD_SETSIZE / 8 / sizeof(long)];
-- a bitmask of fdsfds_bits
is the fd to checkfds_bits
takes longer, but most architectures have "bit is set" insns to optimize this. an unoptimized implementation would take at least 128 tests (FD_SETSIZE / 8
) (meh)sizeof(fd_set)
is 128 bytes (good)FD_SETSIZE
(fine)[0, FD_SETSIZE)
can be monitored withselect()
(fine)so cloudlibc uses 4k of memory instead of 128bytes in order to be slightly faster and to support fds larger than
FD_SETSIZE
. but no POSIX-compliant project would ever use an fd >=FD_SETSIZE
, which means they'd be making assumptions that the underlying code is cloudlibc.WASI API isn't relevant because it doesn't define fd_set, and this is purely about the
select()
API which is entirely part of WASI libc.