celluloid / celluloid-zmq

UNMAINTAINED: See celluloid/celluloid#779 - Celluloid actors that talk over the 0MQ protocol
https://celluloid.io
MIT License
83 stars 25 forks source link

Port to rbczmq #56

Closed paddor closed 8 years ago

paddor commented 9 years ago

I keep getting segfaults from "assertion failed" error messages from programs involving ZMQ. Even just running rake spec in this project brings those errors. I tried on OSX 10.10 and 10.11, same result. I tried with and without lib sodium, same result. Then I noticed that maybe it's because https://github.com/chuckremes/ffi-rzmq has been put into maintenance mode. Apparently https://github.com/methodmissing/rbczmq is the way to go. So I'm thinking about porting celluloid-zmq to that library. Is that a good idea? Or is the low-level approach of ffi-rzmq actually needed by celluloid-zmq?

digitalextremist commented 9 years ago

@chuckremes would know best here. I've been contemplating a port since higher versions of ZMQ don't work with ffi-rzmq. Would be glad for help, if a port is the best idea.

tarcieri commented 9 years ago

The main downside I can think of is this would break JRuby support

paddor commented 9 years ago

@tarcieri From my point of view, it's already broken. ffi-rzmq says it supports ZMQ 4.x, and I have 4.1.3 installed, but it doesn't work at all.

digitalextremist commented 9 years ago

It's known that 4.x doesn't work. But 3.x works completely under jRuby.

paddor commented 9 years ago

Sorry, I didn't know that. Where does it say that?

digitalextremist commented 9 years ago

This is known:

https://github.com/chuckremes/ffi-rzmq/issues/121

paddor commented 9 years ago

@digitalextremist Oh, I see. Thanks for pointing that out.

paddor commented 9 years ago

I saw that there are also https://github.com/zeromq/czmq, https://github.com/Asmod4n/ruby-ffi-czmq, and https://github.com/mtortonesi/ruby-czmq, all of which are based on FFI, which, AFAIK, would guarantee JRuby support. I just have no idea about the differences between them. But the first seems like it's the "official" Ruby binding for CZMQ. Why not use that one?

Asmod4n commented 9 years ago

The official bindings are just thin wrappers around the c code and don't expose any high level ruby functionality, they are the basis for someone who wants to write high level bindings.

The wrapper i wrote is pretty outdated now and is build around assumptions which might no longer be true, it's high level code could be adopted to use the official bindings instead of the abomination i came up with in https://github.com/Asmod4n/ruby-ffi-czmq/blob/master/lib/czmq/libczmq.rb.

Last time i checked https://github.com/mtortonesi/ruby-czmq was broken and doesn't work and it doesn't look like the errors where fixed.

Asmod4n commented 9 years ago

(i also pretty much gave up on ruby as a standalone interpreter and focus much more on mruby now, which has solved the packaging hassle ruby is)

digitalextremist commented 9 years ago

I noticed that second part @asmod4n. Do you mind if we talk separately about that?

digitalextremist commented 9 years ago

Just curious, what benefits are brought by 4.x that are lacking in 3.x? I'm extremely committed to continued feature support on core dependencies, but having done triage a long time by now, I'm prone to prioritize based on gains. What do we gain here, I really want to know. I'm heavily invested in 0MQ.

Asmod4n commented 9 years ago

Mainly security, but also a new wire protocol, the next release will bring thread safe server and client sockets with automatic timeout handling.

paddor commented 9 years ago

I see. Thanks for the explanation.

I don't know how outdated your your wrapper is, or how much work it'd take to adapt it. I am using ZMQ already but not in any sophisticated way (just bi-directional communication between one "broker" and many "clients" (both sides use a ROUTER socket), done with CURVE authentication). Is there much more to know to be able to come up with a nice, Ruby-esque interface and integrate it in your library? :)

What I noticed about your and also @mtortonesi's wrappers is that they don't have any tests. I guess that'd be a good starting point on the road to a stable wrapper?

Asmod4n commented 9 years ago

What is missing is some kind of error handling in the generated wrappers build via zproject (https://github.com/zeromq/zproject/blob/master/zproject_bindings_ruby.gsl) from xml files (https://github.com/zeromq/czmq/tree/master/api).

zproject is the way how the zmq folks tamed automake/cmake et all to build robust APIs around C libraries, which as a byproduct also create wrappers for ruby/python/qt etc.

Asmod4n commented 9 years ago

With zproject for example you can define a class/actor and it automatically generates C skeletons you fill out and get wrappers around them.

Asmod4n commented 8 years ago

Seams like the main culprit has been resolved in https://github.com/chuckremes/ffi-rzmq-core, but looks like the library does stuff the zeromq API doesn't allow: http://api.zeromq.org/4-0:zmq-msg-init.

The API docs explicitly say not to use the zmsg struct, but ffi-rzmq-core does it.

Asmod4n commented 8 years ago

@paddor @digitalextremist looks like its fixed in ffi-zmq-core, see my last post.

chuckremes commented 8 years ago

@Asmod4n Please provide more information on what you think ffi-rzmq-core or ffi-rzmq are doing wrong. I don't see any problems with my handling of zmq_msg_t structs.

Asmod4n commented 8 years ago

zmq_msg_t was always exported in a opaque way e.g. its fields have never been part of the official api, it was at first a pointer, then a struct and now a union, its size changed too. So in essence, ffi-rzmq-core wasn't compatible with libzmq since January.

That also happened because libzmq doesn't define a function to return the size of a zmq_msg_t, opened a issue for that on the issue tracker: https://github.com/zeromq/libzmq/issues/1599

paddor commented 8 years ago

I'm working on a (hopefully nice) CZMQ binding over at paddor/cztop. Any input or help is welcome.

digitalextremist commented 8 years ago

@paddor that gem looks well conceived and exciting. I will watch with interest and help wherever I can. I am highly dependent on Celluloid::ZMQ and maintain most of the code impacted by your topic here.

paddor commented 8 years ago

@digitalextremist Thanks! That's great to hear. I'll add you as collaborator.

paddor commented 8 years ago

@digitalextremist Providing a way to wait for read/write events from sockets to be able to port Celluloid::ZMQ::Reactor to CZMQ turns out difficult, because waiting for write events isn't straight forward without falling back to using zmq_poll_items. Plus, when using zloop, one would have to add and, as soon as the event has been received, immediately remove sockets from the loop, and also keep starting and stopping the zloop, because Celluloid::ZMQ::Reactor apparently uses #run_once (as opposed to #run).

Do you know if it's possible to adapt Celluloid::ZMQ::Reactor to support the more low-level kind of loop, where one would call #run just once?

Asmod4n commented 8 years ago

I had a similar issue with my czmq binding for mruby and wrote my own reactor https://github.com/Asmod4n/mruby-czmq/blob/master/mrblib/reactor.rb https://github.com/Asmod4n/mruby-czmq/blob/master/mrblib/poller.rb

digitalextremist commented 8 years ago

@paddor I believe this is an area where we'd want to be careful to preserve "evented" behavior versus having an infinite loop or similar. See the Celluloid::IO reactor itself, which behaves the same:

/cc: @tarcieri

paddor commented 8 years ago

@Asmod4n Thanks for the help. Very interesting solution.

@digitalextremist I think I have a solution. I'll extend zpoller with the method zpoller_add_writer and then implement Celluloid::ZMQ::Reactor#run_once using a loop that calls zpoller_wait until it doesn't return any more sockets.

paddor commented 8 years ago

@digitalextremist I'm wondering, how do you manage RSpec's lack of support for Rubinius? I'm running into trouble with it in my CZMQ binding when I run it on Rubinius (and JRuby), but only since I implemented zpoller (yesterday). The issue seems to be related to RSpec, though. How does Celluloid manage this? Thanks for any input.

digitalextremist commented 8 years ago

@paddor we've never had problems using RSpec on Rubininus other than the inability to pinpoint the specific Rubinius version we want to test on Travis CI ... what specific problem are you having?

paddor commented 8 years ago

@digitalextremist Thanks. I was having trouble with CZTop. The zpoller specs failed mainly on Rubinius and JRuby, but then even on MRI. Then I learned that RSpec doesn't officially support Rubinius. I filed zeromq/czmq#1299 and it's fixed now. It wasn't RSpec's fault after all. :-) All CZTop specs run very smoothly on MRI, Rubinius, and JRuby now.

By the way: CZTop::Poller (zpoller) was the last class. CZTop is pretty much complete now. Maybe you wanna give it a look. I know we still need a #run_once loop/poller thing for Celluloid::ZMQ (can't use CZTop::Poller, as it's only for reading). I've been thinking about creating a gem _cztop-manualloop. Or maybe build it directly into Celluloid::ZMQ. What's your opinion on it?

paddor commented 8 years ago

@digitalextremist I've started porting Celluloid::ZMQ to CZTop.

Do you know if its internal API is used outside of the project? Like, I assume and understand that methods like Celluloid::ZMQ::Socket#read and #write are expected to only raise IOError, so Celluloid knows the actor should just crash. But do you know if any other projects depend on methods like Socket#get/#set to access a socket's options by passing in integers such as ::ZMQ::RCVTIMEO?

paddor commented 8 years ago

I've finished porting the library code and the specs (WIP, I guess). When I run the specs, I get a wall of backtraces complaining about loose threads.

Celluloid::ZMQ::Socket
Runaway thread: ================ #<Celluloid::Thread:0x007f828b1e88f8@/Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:47 sleep>
Backtrace:
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `sleep'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `block in check'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/wait.rb:14:in `for'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:58:in `check'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:155:in `block in run'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/group.rb:66:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:152:in `run'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:131:in `block in start'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/celluloid-essentials-f0545ce47ed9/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor/system.rb:78:in `block in get_thread'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
Runaway thread: ================ #<Celluloid::Thread:0x007f828b1da140@/Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:47 sleep>
Backtrace:
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `sleep'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `block in check'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/wait.rb:14:in `for'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:58:in `check'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:155:in `block in run'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/group.rb:66:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:152:in `run'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:131:in `block in start'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/celluloid-essentials-f0545ce47ed9/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor/system.rb:78:in `block in get_thread'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
Runaway thread: ================ #<Celluloid::Thread:0x007f828b1c9110@/Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:47 sleep>
Backtrace:
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `sleep'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `block in check'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/wait.rb:14:in `for'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:58:in `check'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:155:in `block in run'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/group.rb:66:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:152:in `run'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:131:in `block in start'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/celluloid-essentials-f0545ce47ed9/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor/system.rb:78:in `block in get_thread'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
Runaway thread: ================ #<Celluloid::Thread:0x007f828b1a3c58@/Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:47 sleep>
Backtrace:
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `sleep'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:63:in `block in check'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/wait.rb:14:in `for'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/mailbox.rb:58:in `check'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:155:in `block in run'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/timers-41145ed260e4/lib/timers/group.rb:66:in `wait'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:152:in `run'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor.rb:131:in `block in start'
 ** /Users/paddor/.gem/ruby/2.3.0/bundler/gems/celluloid-essentials-f0545ce47ed9/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/actor/system.rb:78:in `block in get_thread'
 ** /Users/paddor/src/ruby/celluloid.git/lib/celluloid/group/spawner.rb:50:in `block in instantiate'

I don't know what that means. Can anybody help me?

digitalextremist commented 8 years ago

@paddor This is a known problem. For your purposes right now, circumvent that test if you need immediate feedback; otherwise, this is another thing on my immediate to-do list

This is caused outside Celluloid::ZMQ in Celluloid itself; more precisely, in its addons to the test suite, brought in basically everywhere in the celluvoid.

I'm trying to recall which issue this problem is being address on, but can't at the moment. But like I said, it's known

paddor commented 8 years ago

It's basically all of them :laughing: I had to change a spec file in celluloid to make it stop after the first, otherwise my scrollback buffer of 10000 lines wasn't enough to see the first one.

paddor commented 8 years ago

Okay, thanks for the info. Btw, how are we standing on breaking the API within Celluloid::ZMQ? Now or never? Not at all?

And does it really always have to be IOError if something goes wrong? CZTop raises appropriate exceptions, like ArgumentError for EINVAL, Interrupt for EINTR, SocketError for EHOSTUNREACH, or subclasses of SystemCallError for other errnos from ZMQ.

HoneyryderChuck commented 8 years ago

Problem has been solved here https://github.com/celluloid/celluloid/blob/master/spec/support/configure_rspec.rb#L51-L64, but only for celluloid-io. Try to see whether replacing boot with init in ZMQ group solves it.

digitalextremist commented 8 years ago

Thank you @TiagoCardoso1983

paddor commented 8 years ago

Any updates on this?

rfestag commented 8 years ago

I'd be interested in an update as well. I've tried a few times over the past year to use (C)ZMQ with Celluloid, and had a mixed bag of results. Generally stopped a while back because ZMQ 4.0 was installed on my OS by other dependencies, but none of the existing gems supported it. CZTop is the only one so far that seems to work out of the box (based on very limited testing, but that is light years ahead of other that complain about stack smashing or fail assertions all the time).

EDIT -- For the record, if there is something you haven't had time to look at, I'd be happy to take a look.

paddor commented 8 years ago

@rfestag If you have issues with CZTop, just let me know. Happy to get feedback. :)

paddor commented 8 years ago

CZTop::Poller is now implemented based on the zmq_poller_*() functions and thus also works with thread-safe sockets such as SERVER/CLIENT/RADIO/DISH. Furthermore, CZTop::Poller::Aggregated is the one that can be used in Celluloid::ZMQ, as it provides #readables and #writables (arrays of sockets) after just one call to #wait.

Has that Runaway thread issue been fixed?

rfestag commented 8 years ago

I noticed there is a cztop 0.3.0, but it doesn't appear to work with the version of czmq I have access to via AUR on my Manjaro (Arch derivative) system - 3.0.2-1. What version of czmq is necessary to use the latest cztop?

I also checked out your fork of celluloid-zmq a while back, and it doesn't look like it has been updated. I assume the reactor should use CZTop::Poller::Aggregated instead if CZTop::Poller? Or does that not matter?

As far as I can tell, it looks like the Runaway thread issue issue is still happening when I run the specs.

paddor commented 8 years ago

What version of czmq is necessary to use the latest cztop?

@rfestag Because CZTop is pretty new, and supporting older versions was becoming impossible, I decided to drop support for ZMQ < 4.2 (soon to be released) and CZMQ 3.0.2 (current stable release, next release coming soon too, I guess). If you're on OSX and use Homebrew, you can install both using brew install zmq --with-libsodium --HEAD && brew install czmq --HEAD.

I assume the reactor should use CZTop::Poller::Aggregated instead if CZTop::Poller?

Thanks for the heads up on celluloid-zmq. You are completely right. I just released CZTop 0.4.0, which adds some more compatibility on CZTop::Poller::Aggregated, and changed my branch to use that one (see here). I haven't tested this change though, since you said the Runaway thread issue is still present. 😞

chuckremes commented 8 years ago

Yes, let's use this issue to collaborate with @paddor and @digitalextremist on replacing ffi-rzmq with cztop in celluloid-zmq.

paddor commented 8 years ago

@chuckremes I don't know if you noticed, but I tried to port this repo to cztop before, back in April. I stopped because of the issue mentioned above. Maybe you can build on top of my work: https://github.com/celluloid/celluloid-zmq/compare/master...paddor:cztop

digitalextremist commented 8 years ago

@paddor does this issue actually persist? The original SIGSEGV? I'm sure between those of us on the thread, we can squish that in relatively short order. If it was the loose-threads piece, that ought to be resolved, as of the multiplex branch on Celluloid I'm still on.

paddor commented 8 years ago

@digitalextremist No idea about the SIGSEV, since it's been ages. Actually couldn't remember it.

As for the loose threads, I'll try to rerun the test suite then. :)

digitalextremist commented 8 years ago

Great -- but remember, when in doubt, use the multiplex branch for the time being. Also, I invited you to our Slack lair just now.

digitalextremist commented 8 years ago

@paddor with jruby-1.7.25 and mri-2.3.1 ( with rbx-3.* currently having its own unrelated problems for now ) all the tests pass with libzmq3, after #58 was fixed.

paddor commented 8 years ago

@digitalextremist Thank you so much!