ezmobius / nanite

self assembling fabric of ruby daemons
Apache License 2.0
735 stars 64 forks source link

comparison of Array with Array failed (ArgumentError) #13

Closed taazza closed 15 years ago

taazza commented 15 years ago

Not sure if this is an amqp error or a nanite error, I have posted it on amqp as well.

/vendor/gems/gems/amqp-0.6.5/lib/amqp/buffer.rb:252:in min': comparison of Array with Array failed (ArgumentError) from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:ineach' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:in min' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:137:inleast_loaded' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:23:in __send__' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/cluster.rb:23:intargets_for' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/mapper.rb:198:in send_request' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.10/lib/nanite/mapper.rb:191:inrequest' from base_prog.rb:58:in start' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:incall' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in fire' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:incall' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in run_machine' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:inrun' from base_prog.rb:41:in `start' from base_prog.rb:70

This happens a lot. And when it happens it continues to happen repeatedly every couple of minutes till a restart is done. Wondering if this has to do with rabbitmq/amqp or the state of the nanite.

Any thoughts would be greatly appreciated. Thanks!

roidrage commented 15 years ago

When did you update/install your Nanite gem? The current version on gemcutter.org is 0.4.12, and I've never seen that happen.

roidrage commented 15 years ago

On a second note, I'll see how I go with the AMQP 0.6.5 gem today, but still, I'd encourage you to update your Nanite installation.

taazza commented 15 years ago

We use gem bundler and the current version of nanite on gemcutter is 4.1.10 http://gemcutter.org/gems/nanite Where are seeing 0.4.12? Am I missing something here?

roidrage commented 15 years ago

The 0.4.1.2 version is right there in the list. Version 0.4.1.10 is not the official Nanite gem. I'm afraid it's the RightScale fork and it's full of custom patches for the RightScale product and not properly tested from my point of view. Please install 0.4.1.2, and I'll talk to Ezra how that version ended up on Gemcutter.

taazza commented 15 years ago

Aah... 0.4.1.2! I was looking for 0.4.1 [12] as you had mentioned earlier.

When someone installs nanite 0.4.1.[10] gets selected by default. No worries I will give this a shot and hopefully the problem disappears!

roidrage commented 15 years ago

I'll try to push an updated gem later today.

taazza commented 15 years ago

Thanks! Pls try and get the logging issue in as well ;) You help and prompt responses have been very helpful! Thanks a bunch! Pls close both issues once you are done with the build & push.

I assuming the updated Gem will be posted on gemcutter. Thanks again!

roidrage commented 15 years ago

The gem on gemcutter has been updated. Let me know if there are any problems.

taazza commented 15 years ago

No such luck. Tested it out with nanite-0.4.1.13 and after running for a few hours it runs into the same problem. Exception attached below

/home/test/v_0.1/vendor/gems/gems/amqp-0.6.5/lib/amqp/buffer.rb:252:in min': comparison of Array with Array failed (ArgumentError) from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:ineach' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:in min' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:132:inleast_loaded' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:22:in __send__' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:22:intargets_for' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/mapper.rb:193:in send_request' from /home/test/v_0.1/vendor/gems/gems/nanite-0.4.1.13/lib/nanite/mapper.rb:186:inrequest' from tester.rb:58:in start' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:incall' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/em/timers.rb:51:in fire' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:incall' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in run_machine' from /home/test/v_0.1/vendor/gems/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:inrun' from tester.rb:41:in `start' from tester.rb:70

Had to reboot the machine.

As for logging .. The mapper is all set, INFO issue has disappeared. But the agent still logs the request as INFO

[Sat, 21 Nov 2009 03:35:44 -0500] INFO: SEND [result] <9119b16dc7d01d87ea61e42753b6c0be> [Sat, 21 Nov 2009 03:35:44 -0500] INFO: RECV [result] <9119b16dc7d01d87ea61e42753b6c0be>

leading big log files. Pls reopen this issue. Thx

roidrage commented 15 years ago

Are you using Redis as state storage?

Somehow the status of an agent comes out as an array from the state storage. It would help me to find out what's going on if you could patch the cluster.rb at line 132 to output a[1] and b[1]. Otherwise it'd get hard for me to debug. I'll have a hard look at the data coming into the state store, but it'd be easier to figure out.

I'll look into the agent logging as well, I thought I got them all.

taazza commented 15 years ago

Nope, not using Redis. Let me patch and rebuild the gem and test it out.

I will send you the logs soon. I dont understand why I have to restart the machine for the problem to disappear. Anyways, thanks for taking a look at the issue, we are out of bandwidth to contribute at the moment.

We will pitch in soon. Thanks for all your effort/help. Cheers!

taazza commented 15 years ago

I printed the candidates variable

When you start the mapper and every thing is fine Here is what gets printed.

INFO: [ARGUMENT_ERROR_PATCH] candidates -> nanite-SMEBARUTHI timestamp1258956831 tags status0.0 services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/process nanite-ROJA timestamp1258956827 tags status0.0 services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale /process/thadayam/process

And when things go wrong and array compare failed error pops up this is what gets printed

INFO: [ARGUMENT_ERROR_PATCH] candidates -> nanite-SMEBARUTHI timestamp1259006907 tags statusno status [THIS SEEMS TO BE THE ISSUE - no value instead [no status] gets printed]

services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/process nanite-ROJA timestamp1259006915 tags status0.46 services/masala/process/thadka/process/lao/process/test/execute/vayudooth/process/khale/process/thadayam/process

Hope this helps.

roidrage commented 15 years ago

Thanks, that does help. I'll look into it.

roidrage commented 14 years ago

Sorry for the delay on this one. The problem seems to be that your agent is incapable of executing the command uptime on the machine it's running. What operating system is it, or what happens when you fire up a small Ruby script and just put uptime in it? Either way, the mapper needs to be fixed to not use the status value when it's just "no status".

taazza commented 14 years ago

Matt, we are on Ubuntu 8.0.4 hardy release. When we re-fire the mapper, it runs for a while before it runs into the problem again.

This repeats till we reboot the system.

roidrage commented 14 years ago

Could you try overwriting the default status proc with a debug message, so I can see what the problem might be? Would be nice to fix the root cause of this. Need to change this in the agent's init.rb file, and then watch the log file when it happens again.

status_proc = lambda do begin parse_uptime(uptime) rescue Nanite::Log.error($!) 'no status' end end

Thanks!