Open HertzDevil opened 10 months ago
It looks like all the worker threads eventually got stuck on the fiber_channel.receive
line inside Crystal::Scheduler#run_loop
. This fiber_channel
is a Crystal::FiberChannel
backed by ::IO.pipe
. This honestly looks suspicious to me, since non-blocking pipes themselves depend on the event loop. Is a more lightweight option available here? cc @ysbaddaden @oprypin
All threads waiting on FiberChannel means all threads "parked" themselves but they shouldn't be blocked on it, but blocked on the EventLoop (waiting for completions).
So yeah, suspicious.
Note: with Executioncontext the FiberChannel is no longer used.
With two changes I now manage to get the OP snippet and the specs passing:
module Crystal::System::FileDescriptor
def self.pipe(read_blocking, write_blocking)
# ...
r = IO::FileDescriptor.new(r_pipe.address, read_blocking)
w = IO::FileDescriptor.new(w_pipe.address, write_blocking)
r.read_buffering = false # disable buffering on the read end
w.sync = true
{r, w}
end
end
lib LibC
fun PeekNamedPipe(hNamedPipe : HANDLE, lpBuffer : Void*, nBufferSize : DWORD, lpBytesRead : DWORD*, lpTotalBytesAvail : DWORD*, lpBytesLeftThisMessage : DWORD*) : BOOL
end
struct Crystal::FiberChannel
def receive
# explicitly peek the pipe until there is data available
while true
LibC.PeekNamedPipe(LibC::HANDLE.new(@worker_out.fd), nil, 0, nil, out bytes_avail, nil)
break if bytes_avail >= sizeof(UInt64)
Fiber.yield
end
oid = @worker_out.read_bytes(UInt64)
Pointer(Fiber).new(oid).as(Fiber)
end
end
What's interesting is both ends of the IO.pipe
can now be blocking, which makes me believe pipes are indeed unnecessary here. Everything still works if I wrote:
struct Crystal::FiberChannel
@lock = Crystal::SpinLock.new
def initialize
@queue = Deque(Fiber).new
end
def send(fiber : Fiber)
@lock.sync { @queue << fiber }
end
def receive
while true
fiber = @lock.sync { @queue.shift? }
return fiber if fiber
Fiber.yield
end
end
end
And eventually:
class Crystal::Scheduler
{% flag?(:preview_mt) %}
def run_loop
spawn_stack_pool_collector
loop do
@lock.lock
if runnable = @runnables.shift?
@runnables << Fiber.current
@lock.unlock
resume(runnable)
else
@lock.unlock
Fiber.yield
end
end
end
def send_fiber(fiber : Fiber)
@lock.sync { @runnables << fiber }
end
{% end %}
end
Has the codebase changed so much in these years that we don't need fiber_channel
anymore...? The pipe was already there since https://github.com/crystal-lang/crystal/commit/b52bfb7aeb213c8f5110620a4d3919b10107d865 as part of #8112, but there were no mentions of the rationale behind using a pipe.
On the other hand, I wonder if this also implies IO.pipe
's semantics are underspecified.
Running that spec file will block at the
raises if channel was closed
test. IncreasingCRYSTAL_WORKERS
from the default 4 makes two more specs run and pass, but after that theawakes all waiting selects
test still blocks.This reduction, probably the first Windows-specific one we have, similarly blocks at the
CRYSTAL_WORKERS + 1
-th loop:The snippet doesn't block if run without
-Dpreview_mt
or run on Linux.