Closed palkan closed 4 years ago
Hey @tenderlove & @ioquatix! I thought you'd be interested.
We are building a new event loop using io_uring directly. Happy to review issues with nio4r.
Two notes:
Reproduction fails to install gems.
Might not be nio4r related.
Unless removing the gemlock file, trying to reproduce fails when installing grpc
with the message:
grpc-1.26.0-universal-darwin requires ruby version < 2.7.dev, >= 2.3, which is
incompatible with the current version, ruby 2.7.1p83
According to this talk given by @tenderlove, objects referenced from C (assuming rb_gc_mark
) shouldn't be effected by the GC compactor.
I didn't read through all the code, but a quick search shows that nio4r does call rb_gc_mark
, so I assume they know to do it for all the Ruby objects they hold...
If there's a gem that doesn't use rb_gc_mark
, it might hold a reference to a memory area that later holds a different data set, resulting in corrupted data.
I'm writing this because this might not be a nio4r issue... not if some other gem overwrote data that was "inserted" to the nio4r selector.
This is especially relevant since the selector references an Array and it's content might be reallocated (without reallocating the array itself)...
Maybe we should open an ActionCable issue about this, or an nio4r
if we know that's the source of the issue... though I couldn't find it while reviewing the nio4r code path, unless instances of classes defined by a C extension are movable objects (which I think they shouldn't be), in which case the bug would be here*.
I tried this with iodine and the same issue occurs once ActionCable hijacks the socket and moves it to nio4r
.
I have no control over the locks and the object references once the socket moved away, so I know it isn't an iodine issue... which probably means this isn't a Puma issue.
This also brings back the question: why does Rails need an IO reactor? I really thought we had a good enough solution with the PR we submitted to Rack. The Rails team have a lot on their hands, no need to straddle them with server implementation concerns.
* `nio4r saves all objects within a C referenced array, which means that
rb_gc_mark` only marks the array, but it's allowed to move the objects within the array to a new memory location. Assuming the objects are immovable, since they are instances of a C extension object, everything is should be fine. However, if they are movable objects, there's undefined behavior everywhere since the references are also stored in the libev IO callback pointers.
Reproduction fails to install gems.
@boazsegev Thanks! Removed a .lock
file from the repo and added a branch with minimal deps: bug/issue-6.
I have released v2.5.3 of nio4r which might fix this issue. @palkan can you please test and report back?
@ioquatix Works like a charm! Thanks @boazsegev and @ioquatix!
Overview
In most cases, a Puma worker process crashes right after compaction.
The problem only occurs when we have active connections during the compaction. If we disconnect all the clients, perform compaction and connect new clients, everything works fine.
I suspect nio4r to have some problems with compaction (since we crash in stream_event_loop).
How to reproduce
dip up rails
orbundle exec puma -p 8080 -w 1
COMPACT=1 WAIT=2 SCALE=200 ruby benchmarks/simulate.rb
Crash dump