ezmobius / nanite

self assembling fabric of ruby daemons
Apache License 2.0
735 stars 64 forks source link

Seemingly random nil value for nanite_attributes in cluster.rb:169 #18

Open kingcu opened 14 years ago

kingcu commented 14 years ago

I am getting a seemingly random and intermittent error (about once a day, so around 300 jobs) where inside of cluster.rb, line 169 (inside the block passed to nanites_for), one of the returned nanites has nil for its attributes. I cannot seem to duplicate this issue manually. It fails when, inside that block, the nanite is passed to the 'timed_out' function.

NoMethodError: undefined method []' for nil:NilClass /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:162 /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/cluster.rb:169:innanites_providing'

I am not familiar enough with the code to determine if there is someplace where a race condition is possible? I don't know where to go for further troubleshooting, but, I have implemented a bandaid and am waiting to see if the issue crops up again.

I just added a check for nil before calling timed_out? if nanite_attributes.nil? or timed_out?(nanite_attributes)

If this fixes the issue, I'll commit to my branch here on github...Just hesitant in bandaiding a problem I don't understand!

kingcu commented 14 years ago

Ahh, just saw that I am intermittently also getting an error that seems related.

NoMethodError: undefined method to_i' for []:Array /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/state.rb:54:in[]' /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/state.rb:41:in call' /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/state.rb:41:inlog_redis_error' /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/state.rb:48:in []' /usr/local/lib/ruby/gems/1.8/gems/nanite-0.4.1.13/lib/nanite/state.rb:162:innanites_for'

My previously mentioned bandaid won't fix this issue, so I added some conditional logger statements, which should hopefully give me a better idea as to the state of the system when this happens...