DataDog / chef-handler-datadog

Get Chef stats & events directly into Datadog
https://www.datadoghq.com/
MIT License
15 stars 31 forks source link

Exception handler fails with NoMethodError #51

Closed dwradcliffe closed 10 years ago

dwradcliffe commented 10 years ago

Looks like for some reason run_status.all_resources is nil. This is happening on several nodes in production.

Running handlers:
[2014-06-11T13:42:12+00:00] ERROR: Running exception handlers
[2014-06-11T13:42:12+00:00] ERROR: Report handler Chef::Handler::Datadog raised #<NoMethodError: undefined method `length' for nil:NilClass>
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-handler-datadog-0.4.0/lib/chef/handler/datadog.rb:157:in `emit_metrics_to_datadog'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-handler-datadog-0.4.0/lib/chef/handler/datadog.rb:26:in `report'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:226:in `run_report_unsafe'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:214:in `run_report_safely'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:118:in `block in run_exception_handlers'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:117:in `each'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:117:in `run_exception_handlers'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/handler.rb:127:in `block in <class:Handler>'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:133:in `call'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:133:in `block in run_failed'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:132:in `each'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:132:in `run_failed'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:447:in `rescue in do_run'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:459:in `do_run'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:213:in `block in run'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:207:in `fork'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/client.rb:207:in `run'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/application.rb:217:in `run_chef_client'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/application/client.rb:328:in `block in run_application'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/application/client.rb:317:in `loop'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/application/client.rb:317:in `run_application'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/lib/chef/application.rb:67:in `run'
[2014-06-11T13:42:12+00:00] ERROR: /opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.4/bin/chef-client:26:in `<top (required)>'
[2014-06-11T13:42:12+00:00] ERROR: /usr/bin/chef-client:23:in `load'
[2014-06-11T13:42:12+00:00] ERROR: /usr/bin/chef-client:23:in `<main>'
miketheman commented 10 years ago

That's bad news. Any chance you can try this on a node with Chef 11.12.8, just as a safeguard? Is this widespread across all nodes?

dwradcliffe commented 10 years ago

Looks like it's working on 11.12.8! I didn't notice it has been released. Thanks!

miketheman commented 10 years ago

That's odd, I was unable to reproduce the problem using Chef 11.12.4 - could it have been a fluke?

dwradcliffe commented 10 years ago

It happened on about 15 nodes for at least 12 hours. All 11.12.4. If I have time I might be able to debug a bit.

miketheman commented 10 years ago

It'd be much appreciated. OS Distro, other Chef plugins, etc.

dwradcliffe commented 10 years ago

CentOS 6.3 No other plugins that I can think of.

Currently I'm noticing it on a Recipe Compile Error Chef::Exceptions::RecipeNotFound.

I dug a little and I see that run_status.run_context is nil.

miketheman commented 10 years ago

That's odd - RecipeNotFound is usually something you'd see when a typo or other reference is introduced to a node's run list - by either one of your cookbooks, or one of your dependencies' cookbooks.

miketheman commented 10 years ago

@dwradcliffe were you ever able to pin this down to something reproducible?

dwradcliffe commented 10 years ago

I do keep seeing it. I think maybe it happens when the error is during the compile phase?

miketheman commented 10 years ago

Interesting. So if a node fails to compile, what metrics would you expect to be seen on Datadog's end? Should we test for a length and bail if nonexistent before here? I think if we reorder the metrics, and handle nil there, you'll still get elapsed time and the failure event, but I don't know if the event will contain anything useful, since we don't have any resources to report on.

miketheman commented 10 years ago

@dwradcliffe I've been able to repro in tests! I'm so excited, I can barely contain myself. :smiling_imp: I hope to have a fix for this shortly.

dwradcliffe commented 10 years ago

awesome!!

Elapsed time and failure event would be good, even if we don't have the resource list. Error message in the failure event would be nice. :)

miketheman commented 10 years ago

I think I may have to slate that as a next version feature, but I should be able to have the handler complete and provide a failure event of some sort.

anand2k12 commented 9 years ago

Hello,

Getting an error:

Rackspace openstack alamo.iso Installation stuck at 67%.

Error:

ERROR: running exception handlers ERROR: /usr/bin/chef-client:19:in load' ERROR: /usr/bin/chef-client:19:in

'

Main installation is stopped at 95% Chef-client installation is stopped at 67%

Errors screenshot are attached along with this post. Please help in resolving the same.

90-rackspace

rackspace2

miketheman commented 9 years ago

Hello @anand2k12 ,

It is typically more useful to open a new case with your problem.

However, from the screen images you have sent, I can see that you are attempting to install this with Chef 10.12 - and we only support Chef 10.14 and above.