eric / metriks

An experimental library to instrument ruby
MIT License
478 stars 51 forks source link

crash in exponentially_decaying_sample #13

Closed skade closed 12 years ago

skade commented 12 years ago

Hi,

I just got a sudden crash in exponentially_decaying_sample:

shared/bundle/ruby/1.9.1/gems/avl_tree-1.1.2/lib/red_black_tree.rb:518:in `delete': undefined method `value' for nil:NilClass (NoMethodError)
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/exponentially_decaying_sample.rb:82:in `block (2 levels) in rescale'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/exponentially_decaying_sample.rb:81:in `each'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/exponentially_decaying_sample.rb:81:in `block in rescale'
        from <internal:prelude>:10:in `synchronize'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/exponentially_decaying_sample.rb:78:in `rescale'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/exponentially_decaying_sample.rb:68:in `update'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/histogram.rb:38:in `update'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/timer.rb:34:in `update'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/timer.rb:17:in `stop'
        from shared/bundle/ruby/1.9.1/gems/metriks-0.9.7.3/lib/metriks/timer.rb:49:in `time'
        from releases/20120416124631/lib/custom_handler.rb:45:in `onmessage'
        from shared/bundle/ruby/1.9.1/bundler/gems/slanger-c07a3de1ea6e/lib/slanger/web_socket_server.rb:28:in `block (3 levels) in run'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/connection.rb:19:in `call'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/connection.rb:19:in `trigger_on_message'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/message_processor_06.rb:45:in `message'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/framing07.rb:106:in `process_data'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/handler.rb:28:in `receive_data'
        from shared/bundle/ruby/1.9.1/bundler/gems/em-websocket-b9d439cd3494/lib/em-websocket/connection.rb:71:in `receive_data'
        from shared/bundle/ruby/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
        from shared/bundle/ruby/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
        from shared/bundle/ruby/1.9.1/bundler/gems/slanger-c07a3de1ea6e/bin/slanger:72:in `<top (required)>'
        from shared/bundle/ruby/1.9.1/bin/slanger:21:in `load'
        from shared/bundle/ruby/1.9.1/bin/slanger:21:in `<main>'

Is this a Metriks problem or a problem of the RBTree implementation?

eric commented 12 years ago

Thanks for reporting this. I've reported it in avl_tree and will try to get a fix out.

eric commented 12 years ago

@nahi: Do you have any ideas on this?

I just tried to create a failing test case for avl_tree and was unable to.

eric commented 12 years ago

I've made some progress on tracking down the problem and have posted the progress over here: https://github.com/nahi/avl_tree/issues/4

JustinLove commented 12 years ago

I have the updated avl_tree, but it's just gone to raising an explicit exception instead of an accidental one. I'm timing a call to HTTParty, which might be one of several in parallel; the error has happend on several different branches, but it's intermittent and I haven't attempted an investigation yet.

2012-05-19T12:54:00+00:00 app[web.1]: cannot compare NaN and 0.0 with <=>
2012-05-19T12:54:00+00:00 app[web.1]: Exception intercepted calling #<SimpleApi::PachaSourceCache:0x000000053352c0>
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/avl_tree-1.1.3/lib/red_black_tree.rb:498:in `[]='
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/avl_tree-1.1.3/lib/red_black_tree.rb:81:in `insert'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/exponentially_decaying_sample.rb:83:in `block (2 levels) in rescale'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/exponentially_decaying_sample.rb:81:in `each'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/exponentially_decaying_sample.rb:81:in `block in rescale'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/exponentially_decaying_sample.rb:78:in `rescale'
2012-05-19T12:54:00+00:00 app[web.1]: <internal:prelude>:10:in `synchronize'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/exponentially_decaying_sample.rb:68:in `update'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/histogram.rb:38:in `update'
2012-05-19T12:54:00+00:00 app[web.1]: /app/simple_api/lib/simple_api/json_api.rb:12:in `get'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/timer.rb:38:in `update'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/timer.rb:21:in `stop'
2012-05-19T12:54:00+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/metriks-0.9.8.3/lib/metriks/timer.rb:53:in `time'
eric commented 12 years ago

This is great. I'll be able to guard against this happening. Thanks for your report and trying it with the newer avl_tree!

eric commented 12 years ago

I've released 0.9.8.4 that tries to protect against these issues.

After you've run it for a bit, could you check your logs for instances of ExponentiallyDecayingSample? I've added additional warn's to try to get to the bottom of this.

JustinLove commented 12 years ago

Well, I had a run last week and nothing since. I hope to be doing more in the field use this weekend, but it doesn't seem to be something extremely common.

eric commented 12 years ago

Thanks for getting back to me. I'm really sorry you've run into this. It's been an elusive issue...

JustinLove commented 12 years ago

Saw it again in local dev. Assuming related to new code (timing a small loop doing distance calculations), but it's not reproducing so can't say for certain. Might be that the dev server was running all night this was the first hit this morning.

12:00:43 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644720.418286 start_time: 1340730043.565619
12:00:43 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644720.524134 start_time: 1340730043.880652
12:00:44 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.0444522 start_time: 1340730044.402072
12:00:44 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.047908 start_time: 1340730044.410858
12:00:44 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.048297 start_time: 1340730044.411241
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644720.854233 start_time: 1340730045.3883898
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.0631511 start_time: 1340730045.468469
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.063624 start_time: 1340730045.468841
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.065547 start_time: 1340730045.469302
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.066613 start_time: 1340730045.470614
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.066916 start_time: 1340730045.471047
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.14431 start_time: 1340730045.471334
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.1445959 start_time: 1340730045.471628
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.146909 start_time: 1340730045.4743838
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.1693969 start_time: 1340730045.656949
12:00:45 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644721.170006 start_time: 1340730045.657685
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.048685 start_time: 1340730046.183736
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.0719192 start_time: 1340730046.390934
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.0726368 start_time: 1340730046.391742
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.0729449 start_time: 1340730046.392065
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.0735638 start_time: 1340730046.392517
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.208087 start_time: 1340730046.393037
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.209429 start_time: 1340730046.394022
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.210151 start_time: 1340730046.394387
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.215893 start_time: 1340730046.394778
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.2162821 start_time: 1340730046.395139
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.216636 start_time: 1340730046.395626
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.2177548 start_time: 1340730046.396178
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.340192 start_time: 1340730046.396564
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.341127 start_time: 1340730046.396956
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.3420238 start_time: 1340730046.397805
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.3425212 start_time: 1340730046.398228
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.343015 start_time: 1340730046.398765
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.343951 start_time: 1340730046.399356
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.3445919 start_time: 1340730046.3997598
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.4289331 start_time: 1340730046.400347
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.4298801 start_time: 1340730046.4009001
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.430312 start_time: 1340730046.4013991
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.43275 start_time: 1340730046.40744
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.433106 start_time: 1340730046.416333
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.4333591 start_time: 1340730046.416807
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.433602 start_time: 1340730046.417182
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.527123 start_time: 1340730046.417443
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.5286798 start_time: 1340730046.4331179
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.543639 start_time: 1340730046.43359
12:00:46 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644722.5478332 start_time: 1340730046.435027
12:00:47 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644723.326541 start_time: 1340730047.025939
12:00:47 web.1     | ExponentiallyDecayingSample found a new_key of NaN. key: Infinity old_start_time: 1340644720.048455 start_time: 1340730047.52882
eric commented 12 years ago

Awesome. This will be really helpful in tracking down what's going on. Thanks!

nahi commented 12 years ago

Wow, ordering issues for NaN and Infinity? On Jun 27, 2012 2:13 AM, "Eric Lindvall" < reply@reply.github.com> wrote:

Awesome. This will be really helpful in tracking down what's going on. Thanks!


Reply to this email directly or view it on GitHub: https://github.com/eric/metriks/issues/13#issuecomment-6581800

JustinLove commented 12 years ago

Saw it again on the first hit in the morning, for a rails dev server left running overnight.

eric commented 12 years ago

Just released 0.9.9.1 to fix this, thanks to @lmarburger for the work.