celluloid / celluloid

Actor-based concurrent object framework for Ruby
MIT License
3.88k stars 273 forks source link

Changes to Celluloid::Mailbox handling #342

Closed endofunky closed 9 years ago

endofunky commented 10 years ago

As some of you might have read on Twitter, I have been playing around with alternative, JRuby-specific mailbox replacements over the past weekend. So far I've benchmarked the regular Celluloid::Mailbox against ArrayBlockingQueue and LMAX Disruptor. The results so far have been quite promising, with a roughly 350% increase in raw throughput when using Disruptor:

51200000 push/pop cycles
===============================================================

                         user     system      total        real
Disruptor            8.670000   0.150000   8.820000 (  8.564000)
ArrayBlockingQueue  13.040000   0.150000  13.190000 ( 12.961000)
Celluloid::Mailbox  30.090000   0.410000  30.500000 ( 29.894000)

Disruptor, due to it's lock-free nature also performs incredibly well in high-contention, multi-producer scenarios. ArrayBlockingQueue fails badly here:

9 producer thread(s), 1 consumer thread - 9000000 messages
=============================================================

                         user     system      total        real
Disruptor            9.330000   2.560000  11.890000 (  7.019000)
ArrayBlockingQueue   9.030000  10.430000  19.460000 ( 19.271000)
Celluloid::Mailbox   9.660000   4.480000  14.140000 (  9.546000)

Note that Celluloid::Mailbox and Disruptor are almost on par only because, I believe, we're hitting the limit of how fast a Ruby loop can pull messages out of the queue. Also, the performance is likely to improve (I spoke to @headius on Twitter about this yesterday) once the interaction with Disruptor is moved from Ruby to an annotated Java methods, which I didn't get around to do, yet.

Now, while it might be possible to use a Reactor class and EventedMailbox together, I think it'd be beneficial to use a regular Mailbox (sub-)class instead to minimize the amount of Ruby call frames, avoid having to buffer messages in a Ruby array and use Ruby-land mutexes. However, this is where I ran into an issue. Celluloid at the moment relies on Mailbox classes to include Enumerable to fetch messages of a certain type, which would obviously mutate the queue. From what I can see, most concurrent, lock-free queue data structures (including Disruptor) don't support this without some ad-hoc locking, which would obviously defeat the usage of a lock-free data structure in the first place.

Along with being able to use Disruptor, it would also open up other doors, like integrating with Akka mailboxes when running in a mixed Scala/Akka/JRuby environment (may be even the whole akka.dispatch infrastructure).

I wouldn't mind to get my hands dirty and implement the changes myself, but I'm not 100% sure about he best way and how feasible it is to make these changes to Celluloid.

tarcieri commented 10 years ago

Celluloid at the moment relies on Mailbox classes to include Enumerable to fetch messages of a certain type, which would obviously mutate the queue. From what I can see, most concurrent, lock-free queue data structures (including Disruptor) don't support this without some ad-hoc locking, which would obviously defeat the usage of a lock-free data structure in the first place.

This isn't the common case. Celluloid supports "selective receive" that can extract specific messages as you describe, but selective receive is only used for uncommon operations, like the linking protocol.

Most of the time we'll just be reading whatever message happens to be available next (also note that we merely prefer "best effort" ordering)

I think we need to support both, but perhaps there's a way to only grab a lock if an operation is particularly contentious. This will grind the rest of the system to a halt, but hopefully it doesn't happen very often.


That said, what strategies were you using with Disruptor? Can you post the code somewhere?

endofunky commented 10 years ago

Thanks for the feedback. I had the best performance with the YieldingWaitStrategy which even performs well with >50 threads writing to the same queue. Internally it spins for a very short amount of time and then calls Thread.yield(). The ClaimStrategy classes are gone in the newer versions, but I'm using the equivalent of the MultiProducerClaimStrategy.

I've implemented a wrapper class in Scala so that at the moment only one call from Ruby has to be made. As I mentioned before, I'm going to move that into an annotated method as well. It's not very idiomatic to avoid having to bundle the Scala library/runtime.

I've uploaded the code here: gist

I haven't implemented the whole Mailbox class yet as I was unsure how to handle things like the linking requests etc.

halorgium commented 10 years ago

@tobiassvn the Mailbox should be oblivious to the idea of linking requests. Currently the only special casing is the SystemEvent type-checking. I think we should add a more Ruby-esque method for determining the prioritization.

Also, @tarcieri, I think we should remove Actor#receive as if people want to do this they need to use exclusive mode. We could easily suggest that if someone wants to do their own handling to use a custom behaviour. I believe we can change all the current code to not require Mailbox#receive with a block.

Perhaps we could discuss this on IRC?

tarcieri commented 10 years ago

@halorgium sure

tarcieri commented 10 years ago

@tobiassvn after talking it over with @halorgium, we found there are relatively few uses of the selective receive feature in Celluloid itself, and the receive method at the "Cell" level could arguably be removed.

We still need some way of implementing a priority queue by which system messages can be prioritized, possibly by using two mailboxes/queues/ring buffers. The minimum selection granularity is to be able to select on all messages or just system messages exclusively. If you can handle that particular special case we can probably rework Celluloid's internals accordingly.

Shout out to @mental, still right after all these years ;) Should've separated the mailboxes from the start

halorgium commented 10 years ago

To clarify, I do not believe we need two mailboxes. We simply need to be prioritizing some events to the front of the queue.

Very pseudo-codish:

def receive(timeout)
  @high.shift || @low.shift
end

/cc #338 celluloid/celluloid-io#56

halorgium commented 10 years ago

I have made some progress with this. I also have switched Future to use Condition which eases in this transition. #344

endofunky commented 10 years ago

So far, I like where this is going. However, I think having mailboxes maintain two separate queues feels a bit weird and the double pop might also have performance implications. Two mailboxes per actor would be one solution, but I think I have a different idea.

We could possibly implement an adapted version of Akka's publish/subscribe event bus, which is used by Akka to handle dead letters, logging and other things:

http://doc.akka.io/docs/akka/snapshot/java/event-bus.html

Few things to consider here to make it work for SystemEvents:

  1. We can still use blocks to filter events received
  2. We could add an option to invalidate a subscription after the first event has been received
  3. We would have to lock and unlock actors in two methods. Eg. in case of the linking events: 1) lock when we suscribe and 2) unlock in the callback.

This opens up a few new options, some of which are easier to implement than others:

  1. We could move the logging of dead messages out of the mailbox into a separate DeadLetter handler
  2. People could subscribe to dead-letters to automatically grow/shrink actor pools
  3. To make this fault-tolerant it could be implement as an Actor itself
  4. Can use a mailbox to handle incoming events, which means it could be distributed to implement things like remote supervision
  5. Could possibly be used to collect metrics in a central place for things like a Typesafe Console clone
  6. Akka uses this system to send messages to groups of actors at once, which we could easily support as well

I've actually written something similar before for a different project so got some code I could reuse. Therefore I wouldn't mind taking this one on.

endofunky commented 10 years ago

It just occurred to me that if we support subscription priority and track if an event has been handled by another subscriber before, we could also have a way to track unhandled system events: #338.

halorgium commented 10 years ago

@tobiassvn all good stuff. my recent goal has been to try and simplify the current set of features in celluloid. Currently, there are only a few uses for external Actor#receive and it has rather confusing semantics right now. See celluloid/celluloid-io#56 for more info.

endofunky commented 10 years ago

As discussed briefly with @halorgium on Twitter earlier, I'll implement a prototype the next days. Hopefully should have something working by the end of the week. After looking through the code a bit more this morning it shouldn't be too difficult to implement and would decouple the SystemEvent handling etc. quite nicely.

halorgium commented 10 years ago

@tobiassvn excellent.

Some things to keep in mind:

digitalextremist commented 9 years ago

Where does this stand? Is it still relevant? It's from back in 10/13.

digitalextremist commented 9 years ago

I believe this was resolved by SystemEvent tasks moving to the front automatically. If not, reopen.