Closed ghost closed 5 years ago
Ugh, so the "Bad file descriptor" message might be wrong because getaddrinfo
(on that line 40) actually returns the error code.
http://man7.org/linux/man-pages/man3/getaddrinfo.3.html
I'll put up this patch to fix the error handling:
if (auto err = getaddrinfo(hostname, nullptr, &hint, &info)) {
if (err == EAI_SYSTEM) {
PLOG(ERROR) << "System error qualifying qualify local hostname";
} else {
LOG(ERROR) << "Error qualifying qualify local hostname: " << gai_strerror(err);
}
return "";
}
If you have handy the setup that reproduces, would you mind trying it?
That said, can you also share your ulimit -n
for the maximum number of open FDs?
@snarkmaster , the ulimit -n was 1024. I've tried 2048, but it seems to show the same errors.
With 1000 jobs, worker console show errors (but jobs actually ran to completion): I0517 09:02:11.589076 11198 TaskSubprocessQueue.cpp:113] Task job993, node1:1495025520 message: {"status":{"result_bits":4},"invocation_rand":6304019436659602736,"event":"got_status","worker_host":"","invocation_start_time":1495026072,"raw_status":"done"} I0517 09:02:48.227895 11216 BistroWorkerHandler.cpp:285] Queueing healthcheck started at 1495026168 E0517 09:02:48.228618 11191 hostname.cpp:40] Failed to fully qualify local hostname: Bad file descriptor [9]
Python script to generate test configuration:
server.sh
worker.sh
job_script.py