EndPointCorp / end-point-blog

End Point Dev blog
https://www.endpointdev.com/blog/
17 stars 65 forks source link

Comments for Cassandra, Thrift, and Fibers in EventMachine #300

Open phinjensen opened 6 years ago

phinjensen commented 6 years ago

Comments for https://www.endpointdev.com/blog/2010/05/cassandra-thrift-and-fibers-in/ By Ethan Rowe

To enter a comment:

  1. Log in to GitHub
  2. Leave a comment on this issue.
phinjensen commented 6 years ago
original author: Robert
date: 2010-08-26T05:31:54-04:00

Hi, thanks for the interesting article. However when I run a test and measure execution time, I don't get what the advantage is:

Code-Snippet: rounds = 100 start = Time.now EM.run do @done = 0 rounds.times { |x| Fiber.new do write(get_client_em, 'foo' + x.to_s, {'aard' => 'vark'}) @done += 1 end.resume } EM.add_periodic_timer(1) { EM.stop if @done == rounds } end puts "Fiber: " + (Time.now - start).to_s + "s"

start = Time.now client = get_client rounds.times { |x| write(client, 'bar' + x.to_s, {'aard' => 'vark'}) } puts "Linear: " + (Time.now - start).to_s + "s"

Yields: Fiber: 1.222737s Linear: 0.066181s (<0.4s if creating a new connection for each request)

Setup: Empty Cassandra 0.6.3 1 Node, Ruby 1.9.3, MacOSX

Am I missing something?

phinjensen commented 6 years ago
original author: Robert
date: 2010-08-26T05:48:50-04:00

OK, if I have more write-heavy tasks, fiber seems to catch up. I don't get it faster than a serial implementation without running into connection errors (125 seems to be the maximum for me) but I get your point...

phinjensen commented 6 years ago
original author: Ethan Rowe
date: 2010-08-26T07:21:20-04:00

Robert,

I did some subsequent work with this and never was particularly happy with the performance I was getting in the Ruby space.

Another member of the project team wrote equivalent stuff with C++ and was, unsurprisingly, able to get a considerable improvement in write volume. But ultimately there were performance issues with the Thrift client itself, which I believe is true for Ruby as well.

In any event, given your deployment scenario, you ought to be able to get better overall throughput in the Ruby space through EventMachine, but for a real high-volume processing task, Ruby/Thrift is probably going to disappoint. But if you just need to make a handful of writes/request to Cassandra in a Ruby-based webapp, for instance, then I would think you ought to be able to get some advantage out of EventMachine. Is it going to be worth the trouble, though? I can't say.

So, as usual, it depends on your use case. I should note that this ought to have a bigger impact if your configuration involves writes to multiple nodes (a quorum write, for instance, with a replication factor of 3 or more). In this scenario, I would expect longer I/O waits per write, so having the wait state on a background thread while allowing the main thread to do more processing would be potentially helpful. If you're running Cassandra locally with no replication, it's not representative of a production deployment and the EventMachine stuff could well make no difference: the processor time you want for Ruby-space work while EventMachine parallelizes your I/O is competing with the processing needs of your local Cassandra to service the requests you just issued it.

It's early and I'm foggy-headed, so hopefully something in the above actually makes sense. :)

Thanks for the comments.

phinjensen commented 6 years ago
original author: Ethan Rowe
date: 2010-08-26T07:49:26-04:00

Clarification: I said that "given your deployment scenario" that you ought to be able to get better throughput. But I don't know your deployment scenario; that should have been "given an appropriate use case and deployment scenario". :)

phinjensen commented 6 years ago
original author: Robert
date: 2010-08-26T09:44:03-04:00

Hi Ethan, thanks for the fast and clarifying answer! It does make sense to me.

Since Ruby is not really making use of my Quadcore and with full load Ruby + Fauna + Thrift taking 80% CPU vs. 20% for Java executing Cassandra there is only a chance for a better-than-break-even compensation for EM-overhead if Cassandra spends more CPU- or system time.

I'm just beginning with Ruby and want to use Cassandra for my database. With Cassandra 0.7 on the one end and Rails 3.0 on the other end it's hard enough to get anything working, so I'm concentrating on system design and backend stuff until hopefully things become more stable. So no deployment within the next time...

Any hints for current Ruby/Rails/Cassandra stuff would be very welcome... ;-)

Greetings from Berlin, Robert