Open jeffpc opened 8 years ago
:+1: for issue =)
:clap:
port-test
brokenness confirmed on:
SunOS unstable10x 5.10 Generic_147441-19 i86pc i386 i86pc
SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
SunOS unstable11x 5.11 11.2 i86pc i386 i86pc
SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
For the record, here is the illumos bug about this: https://www.illumos.org/issues/6474
The bug is already resolved in the next release of Oracle Solaris and is tracked under bug 17943298.
It turns out that Solaris event ports (used by illumos-based distros as well as Solaris) are slightly broken for FIFOs.
Specifically, it is possible for
port_getn
to return an event on a FIFO fd (registered viaPORT_SOURCE_FD
andPOLLIN
) even though there is nothing available for reading. This is an illumos bug - not a powerdns bug.Here is a minimal test case: port-test.txt.
If you run it with stdin being a file or a tty,
port_getn
blocks until bytes are available for reading:If, however, stdin is a pipe,
port_getn
returns immediately because the fd is marked as readable by thegetpeerucred
call.In the case of the recursor, a spurious event causes the thread to attempt to read the pipe, which causes it to block. A spurious event can be generated using the
pfiles(1)
utility. (e.g.,pfiles 123
if 123 is thepdns_recursor
pid)Possible workarounds for this issue
port_getn
returns spurious events, and if so failPortsFDMultiplexer
initialization. @Habbie indicated that this will cause the daemon to fall back to a different multiplexer.read(2)
and an immediate return toport_getn
to get more events.Other systems that may be affected: OpenSolaris, Solaris 10, Solaris 11
Finally, powerdns should inspect the returned event for sanity. There are four members in the
port_event_t
struct:But powerdns only looks at one:
portev_object
. Theportev_events
should containPOLLIN
and/orPOLLOUT
andportev_source
should bePORT_SOURCE_FD
.Gory details
On illumos, FIFOs are implemented using the
fifofs
module which presents a filesystem that manages all FIFOs on the system. Each FIFO can be in one of two modes: fastpath or streams. In the fastpath mode, a lot of the heavy-weight STREAMS code is bypassed, however not all operations are supported. In the STREAMS mode, the full functionality is available. FIFOs are created in the fastpath mode, and if they attempt to use "advanced" functionality they are converted to streams mode. Once in a streams mode, there is no way to go back to fastpath mode.The
getpeerucred
call (which is invoked bypfiles
) ends up issuing an ioctl on the FIFO backed by thefifofs
module. Since the_I_GETPEERCRED
command is not handled by fastpath mode, the FIFO is converted to a streams mode FIFO. During this conversion, all poll and event ports waiters are awoken to cause them to re-register with the newly converted FIFO. The kernel stack:pfiles(1)
works by injecting syscalls into the target process. One of the syscalls that it injects is the call togetpeerucred
. This is whypfiles(1)
cat trigger this issue.