Closed elindsey closed 4 years ago
The unit tests that failed in travis also failed for me locally without this diff applied. Are those known broken or new failures?
Hey that is a very nice fix! Thanks! Very much appreciated. Those online transport tests are a bit fragile yes, I'll have a look.
Most of the time we only need a read or a write callback registered with libuv - for example, on a UDP request a write callback is registered, when executed the write callback performs the write, deregisters itself, and registers a read callback.
However there is one case where getdns registers both read and write callbacks: when a backlog of TCP requests is going to the same upstream resolver, we use a single fd and queue the requests. In this instance we want to listen for both read (to get responses for requests we've already sent) and write (to continue to send our pending requests).
libuv, like most event libraries, only allows one callback to be registered per fd. To get notification for both reads and writes, you should examine the event flags and have appropriate conditional logic within the single callback. Today getdns incorrectly tries to register two separate poll_t with libuv, one for read and one for write - this results in a crash (internal libuv assertion guaranteeing that only a single poll_t is registered per fd).
Testing was done by using flamethrower (https://github.com/DNS-OARC/flamethrower) to toss queries at a program that embeds getdns.
Note that a higher qps trigger a different getdns/libuv crashing bug that occurs when the TCP backlog grows so large that requests start to time out. That crash is not addressed in this PR, and will be more involved to fix.