`ares_fds` can cause an abort or write out-of-bounds

sfan5 commented 10 months ago

select has implementation-defined limits on the highest fd it can accept, e.g. 1024 on Linux. When it is is exceeded and the API user calls ares_fds bad things happen.

Test code

The test code provokes this issue by filling the fd table, but note that this can happen naturally if you e.g. use c-ares in a big server application.

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/select.h>
#include <sys/resource.h>
#include <ares.h>

void test(ares_channel_t *channel) {
    fd_set dummy;
    ares_fds(channel, &dummy, &dummy);
}

static void empty(void *arg, int status, int timeouts, struct hostent *host)
{}

int main() {
    struct rlimit lim = { .rlim_cur = 3000, .rlim_max = 3000 };
    if (setrlimit(RLIMIT_NOFILE, &lim) == -1)
    { perror("setrlimit"); return 1; }

    for (int i = 2; i < FD_SETSIZE + 42; i++) {
        int ret = dup(2);
        if (ret == -1) { perror("dup"); return 1; }
    }

    ares_library_init(ARES_LIB_INIT_ALL);
    ares_channel_t *channel;
    ares_init(&channel);

    ares_gethostbyname(channel, "google.com", AF_INET6, empty, NULL);

    test(channel);
}

Expected result

Not sure actually. Maybe ares_fds could return an error? select is just fundamentally broken in this situation so I think the documentation should also discourage using it.

Actual result

% ./test
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
zsh: IOT instruction (core dumped)  ./test

Or if you have c-ares compiled without _FORTIFY_SOURCE you can actually see it corrupting stack data. Replace test() with this:

    char one[10] = {0};
    fd_set dummy;
    char two[10] = {0};

    ares_fds(channel, &dummy, &dummy);

    for (int i = 0; i < 10; i++)
        printf("%02x ", (int)one[i]);
    for (int i = 0; i < 10; i++)
        printf("%02x ", (int)two[i]);
    puts("");

results in

% ./test                                                
00 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

bradh352 commented 10 months ago

Honestly not sure if there is anything we can/should do other than to annotate in the docs.

The implementation for the fd_set is very OS-dependent. Some systems allow you to redefine FD_SETSIZE before declaring the fd_set variable, and it could be large enough to accommodate the known behavior for the system.

Also, some systems use FD_SETSIZE to indicate the number of file descriptors it can hold (e.g. Windows), while others use it to to actually indicate the maximum value allowed for the file descriptor itself (e.g. Linux's glibc).

The only truly supported way to retrieve file descriptors these days for c-ares is to use the socket state callbacks registered with ares_init_options().

That said, I'm currently working on implementing an event thread option (#611) as I don't believe its wise to delegate file descriptor handling like c-ares currently mandates. It should be an option, but I'd expect most users would want c-ares to handle this for them.

sfan5 commented 10 months ago

Good question, I'm not entirely sure either.

curl for example ignores these out-of-range sockets in a function operating on fd_sets. This leads to weird hard-to-detect bugs but hey at least there's no buffer overflow.

Another alternative I think is good would be to return an error code. It will break the user's application instantly but can basically serve as a sign that they need to switch to a different API.

That said, I'm currently working on implementing an event thread option (https://github.com/c-ares/c-ares/issues/611) as I don't believe its wise to delegate file descriptor handling like c-ares currently mandates.

My two cents: sounds useful and this would simplify my use case a lot too.

bradh352 commented 10 months ago

Another alternative I think is good would be to return an error code. It will break the user's application instantly but can basically serve as a sign that they need to switch to a different API.

You mean always, e.g. deprecate ares_fds(), or are you saying to somehow try to detect? I don't think its really feasible to detect since we don't know what the user's FD_SETSIZE may have been in their own creation of the fd_set in their own module, we won't have visibility into that. Plus figuring out all the aforementioned OS-specific implementations is probably not really feasible.

sfan5 commented 10 months ago

Try to detect it is what I mean. I don't know how many exotic platforms c-ares needs to run on but the main sides seem to be >= FD_SETSIZE is bad (Linux, BSD, macOS, ...?) or no problem at all (Windows).

since we don't know what the user's FD_SETSIZE may have been in their own creation of the fd_set in their own module

Isn't that case broken anyway if you call ares_process? It won't know about the enlarged size. I would err on the side of caution here and reject everything above FD_SETSIZE (if the platform is known to have this issue).

bradh352 commented 10 months ago

The FD_SETSIZE just is a define that controls the size of the variable itself (likely allocated in the stack of the caller as it is passed by reference), the FD_SET() doesn't do a bounds check because of this.

c-ares / c-ares