Closed TvL2386 closed 12 years ago
FYI: When I just have 2 nodes, everything seems to work like I would suggest. The info service works perfectly!
It looks like you're not giving each of the nodes a unique ID. Perhaps I can try to derive the node ID from the address if it isn't given.
It doesn't matter whether I give all nodes a unique I'd or not. Result is the same
The id is the same as each host name if you don't specify it
So you're really running into this problem even if every node has a unique node ID?
yep:
# node1
1.9.3p194 :001 > require 'dcell'
=> true
1.9.3p194 :002 > DCell.start :id => "node1", :addr => "tcp://127.0.0.1:2042",
1.9.3p194 :003 > :registry => {
1.9.3p194 :004 > :adapter => 'redis',
1.9.3p194 :005 > :host => '127.0.0.1',
1.9.3p194 :006 > :port => 6379
1.9.3p194 :007?> }
I, [2012-06-03T08:02:18.806961 #3410] INFO -- : Connected to node1
=> #<Celluloid::Supervisor(DCell::Group):0xd36cb0>
1.9.3p194 :008 > I, [2012-06-03T08:02:58.123416 #3410] INFO -- : Found node node66
I, [2012-06-03T08:03:03.128523 #3410] INFO -- : Connected to node66
I, [2012-06-03T08:03:28.393518 #3410] INFO -- : Found node node67
I, [2012-06-03T08:03:33.402549 #3410] INFO -- : Connected to node67
W, [2012-06-03T08:03:48.198882 #3410] WARN -- : Communication with node66 interrupted
I, [2012-06-03T08:03:48.211829 #3410] INFO -- : Connected to node66
W, [2012-06-03T08:04:03.222294 #3410] WARN -- : Communication with node66 interrupted
W, [2012-06-03T08:04:03.222624 #3410] WARN -- : Communication with node67 interrupted
I, [2012-06-03T08:04:03.240529 #3410] INFO -- : Connected to node66
I, [2012-06-03T08:04:03.240930 #3410] INFO -- : Connected to node67
W, [2012-06-03T08:04:23.481235 #3410] WARN -- : Communication with node66 interrupted
W, [2012-06-03T08:04:23.481465 #3410] WARN -- : Communication with node67 interrupted
I, [2012-06-03T08:04:28.295843 #3410] INFO -- : Connected to node66
I, [2012-06-03T08:04:28.296402 #3410] INFO -- : Connected to node67
W, [2012-06-03T08:04:38.496951 #3410] WARN -- : Communication with node67 interrupted
W, [2012-06-03T08:04:43.307204 #3410] WARN -- : Communication with node66 interrupted
# node66
1.9.3p194 :001 > require 'dcell'
=> true
1.9.3p194 :002 > DCell.start :id => "node66", :addr => "tcp://127.0.0.1:2066",
1.9.3p194 :003 > :directory => {
1.9.3p194 :004 > :id => 'node1',
1.9.3p194 :005 > :addr => 'tcp://127.0.0.1:2042'
1.9.3p194 :006?> }
I, [2012-06-03T08:02:53.112467 #3437] INFO -- : Connected to node1
I, [2012-06-03T08:02:53.113040 #3437] INFO -- : Connected to node66
=> #<Celluloid::Supervisor(DCell::Group):0xcf91a8>
1.9.3p194 :007 > W, [2012-06-03T08:03:38.891409 #3437] WARN -- : Communication with node1 interrupted
I, [2012-06-03T08:03:43.908094 #3437] INFO -- : Found node node67
I, [2012-06-03T08:03:43.908460 #3437] INFO -- : Connected to node1
I, [2012-06-03T08:03:48.431099 #3437] INFO -- : Connected to node67
W, [2012-06-03T08:03:58.432590 #3437] WARN -- : Communication with node67 interrupted
I, [2012-06-03T08:03:58.453965 #3437] INFO -- : Connected to node67
W, [2012-06-03T08:04:18.475492 #3437] WARN -- : Communication with node67 interrupted
I, [2012-06-03T08:04:18.484619 #3437] INFO -- : Connected to node67
W, [2012-06-03T08:04:48.996539 #3437] WARN -- : Communication with node1 interrupted
node67
1.9.3-p194 :002 > require 'dcell'
=> true
1.9.3-p194 :003 > DCell.start :id => "node67", :addr => "tcp://127.0.0.1:2067",
1.9.3-p194 :004 > :directory => {
1.9.3-p194 :005 > :id => 'node1',
1.9.3-p194 :006 > :addr => 'tcp://127.0.0.1:2042'
1.9.3-p194 :007?> }
I, [2012-06-03T08:03:23.381680 #3465] INFO -- : Connected to node1
I, [2012-06-03T08:03:23.382468 #3465] INFO -- : Connected to node67
=> #<Celluloid::Supervisor(DCell::Group):0x158914c>
1.9.3-p194 :008 >
1.9.3-p194 :009 > I, [2012-06-03T08:03:33.896566 #3465] INFO -- : Found node node66
I, [2012-06-03T08:03:38.902664 #3465] INFO -- : Connected to node66
W, [2012-06-03T08:03:48.903876 #3465] WARN -- : Communication with node1 interrupted
W, [2012-06-03T08:03:48.904233 #3465] WARN -- : Communication with node66 interrupted
I, [2012-06-03T08:03:58.230268 #3465] INFO -- : Connected to node1
I, [2012-06-03T08:03:58.230809 #3465] INFO -- : Connected to node66
W, [2012-06-03T08:04:28.974255 #3465] WARN -- : Communication with node1 interrupted
W, [2012-06-03T08:04:33.287819 #3465] WARN -- : Communication with node66 interrupted
I, [2012-06-03T08:04:33.990289 #3465] INFO -- : Connected to node1
I, [2012-06-03T08:04:33.990845 #3465] INFO -- : Connected to node66
W, [2012-06-03T08:04:53.326536 #3465] WARN -- : Communication with node1 interrupted
I've tried rbx-2.0.testing just for fun and it does exactly the same.
as soon as there are more than 2 nodes, the communication interruptions start.
You're doing this all from irb... there are known issues with this and readline blocking every thread.
Can you try any of the following: 1) Disabling readline by putting IRB.conf[:USE_READLINE] = false in .irbrc 2) Putting your code in Ruby scripts instead of using irb 3) Using JRuby which doesn't have the readline-related problems
I've put them in scripts (see https://gist.github.com/2864368)
Running them with ruby-1.9.3p194 gives the same result. Running them in seperate irb sessions with --noreadline gives the same result.
Running the three scripts as followed:
rvm use jruby-1.6.7
ruby --1.9 -rrubygems nodeX.rb
gives the same result...
Regards, Tom
for what it's worth, I'm seeing this as well and I'm definitely not running my examples from irb. http://github.com/knewter/skynet <-- if you follow the README, you see this after a little bit. Of course, that's not as simple as the example of the issue given here. Still, +1
it doesn't matter whether you run it from irb or not, whether you use ruby-1.9.3 or jruby-1.6.7 or rubinius-2.0.0.testing... It's all the same.
I will investigate this further when I have time
I also am having this issue.
I have plans to make some pretty major changes to the way DCell works in general, and will probably be shifting back onto Zookeeper by default until the gossip protocol can be more stable
So what is the recommended way of getting the examples up and running? Zookeeper?
@therealjessesanford unfortunately Zookeeper is broken at the moment, so there's not a lot to do besides wait for Zookeeper support to be fixed or submit a patch :(
Any news on this?
No, sorry, I have mostly been spending my time working on Celluloid. I had planned to pick this up after Celluloid 0.12.0, however there were enough bugs in that release I really need to get Celluloid 0.12.1 out before I can take a look at DCell again.
No problem, can point me in the direction of some of the possible solutions you were thinking of? Maybe I'll take a swing at it.
In short: revert 9dc9245f904deccb
That said, your best bet for a first step would be to at least get DCell green on Celluloid 0.12.1 (unreleased at https://github.com/celluloid/celluloid master)
ok cool, that gives me somewhere to start
I reverted 9dc9245 in e3115f28. This should put us back on stable ground. I'm calling this issue solved.
yes, this works much better. thank you
Hi,
I'm running 3 ubuntu 12.04 amd64 nodes. When starting dcell in an irb session, I get communication interrupted. This does not seem good...
I'm running the followin versions: