eric / metriks

An experimental library to instrument ruby
MIT License
478 stars 51 forks source link

ArgumentError: comparison of Float with Float failed #6

Closed mperham closed 12 years ago

mperham commented 12 years ago

I have no idea what this error means. :-( We've monkeypatched the redis gem to time the redis operations we do, in this case a del operation timing update failed.

[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:77:in `[]='
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:77:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:75:in `each'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:75:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:72:in `synchronize'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:72:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/exponentially_decaying_sample.rb:62:in `update'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/histogram.rb:38:in `update'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/timer.rb:34:in `update'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/timer.rb:17:in `stop'
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/timer.rb:49:in `time'
config/initializers/redis.rb:15:in `call'
[GEM_ROOT]/gems/redis-2.2.2/lib/redis.rb:681:in `del'
/usr/local/rvm/rubies/ree-1.8.7-2012.02/lib/ruby/1.8/monitor.rb:242:in `synchronize'
[GEM_ROOT]/gems/redis-2.2.2/lib/redis.rb:680:in `del'
app/models/account.rb:14:in `expire_redis_cache'
eric commented 12 years ago

What version of ruby are you using?

mperham commented 12 years ago

Latest REE 1.8.7.

eric commented 12 years ago

Are you calling Timer#time() or Timer#update() to record the timing?

mperham commented 12 years ago

I don't have the code in front of me but the backtrace says:

[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/timer.rb:49:in `time'
eric commented 12 years ago

Oh, good point. Sorry I missed that.

I'm investigating.

mperham commented 12 years ago

Looks like it might be a NaN issue in your calcs:

http://www.stephensykes.com/blog_perm.html?152

[0.0, 1/0.0].sort
=> [0.0, Infinity]
[0.0, 0/0.0].sort
ArgumentError: comparison of Float with Float failed
Thanks to Evan Weaver for pointing this out to me. This is due to 0/0.0 returning the value NaN (as opposed to 1/0.0 which returns Infinity). And the comparison with NaN by sort gives the strange error message.
eric commented 12 years ago

Very interesting...

eric commented 12 years ago

Does this happen all the time or did it just happen once?

mperham commented 12 years ago

It's happened just once.

eric commented 12 years ago

I had a theory that rand on exponentially_decaying_sample.rb:41 was returning 0.0 and causing a problem:

        priority = weight(timestamp - @start_time) / rand

but I've tried mocking rand to return 0.0 and it didn't raise when the rescale happened.

Still looking...

eric commented 12 years ago

It isn't quite as elegant (I should fix that), but if you would like to use a Timer that isn't using the ExponentiallyDecayingSample until we can get to the bottom of it, you can create a timer with:

Metriks::Registry.default.add('redis.timer.name', Metriks::Timer.new(Metriks::Histogram.new_uniform))

and can access it with:

Metriks::Registry.default.get('redis.timer.name')

or

Metriks.get('redis.timer.name')
mperham commented 12 years ago

It just happened a second time. Is it possible that it's a thread-safety issue?

eric commented 12 years ago

I definitely don't see where the unsynchronized access could be happening...

I'm still brainstorming how this could be happening.

eric commented 12 years ago

I've just released 0.9.5 with a guard against a priority of NaN in 28996f6. Let me know if you have a chance to try it and if you get any of the warns.

Successfully registered gem: metriks (0.9.5)
mperham commented 12 years ago

Just happened again:

[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:83:in `[]='
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:83:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:81:in `each'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:81:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:78:in `synchronize'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:78:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:68:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/histogram.rb:38:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:34:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:17:in `stop'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:49:in `time'
config/initializers/redis.rb:15:in `call'
[GEM_ROOT]/gems/redis-2.2.2/lib/redis.rb:681:in `del'
evan commented 12 years ago

You can also get this message if spaceship returns nil:

irb(main):029:0> class Float
irb(main):030:1> def <=>(a)
irb(main):031:2> end
irb(main):032:1> end
=> nil
irb(main):033:0>  [1.0, 2.0].sort
ArgumentError: comparison of Float with Float failed
  from (irb):33:in `sort'
  from (irb):33
  from /opt/local/bin/irb:12:in `<main>'

Because of:

VALUE
rb_num_coerce_relop(VALUE x, VALUE y, ID func)
{
    VALUE c, x0 = x, y0 = y;

    if (!do_coerce(&x, &y, FALSE) ||
  NIL_P(c = rb_funcall(x, func, 1, y))) {
  rb_cmperr(x0, y0);
  return Qnil;      /* not reached */
    }
    return c;
}
eric commented 12 years ago

I'm going to investigate (benchmark) using avl_tree instead of rbtree to see if it clears up these issues.

I'll let you know how it progresses.

eric commented 12 years ago

I've just released 0.9.6 that uses avl_tree instead of rbtree.

Let me know if it works any better.

mperham commented 12 years ago

Now we see this error:

NoMethodError: undefined method `values' for #<RedBlackTree::Node::EmptyNode:0x4e35e78>
[GEM_ROOT]/gems/avl_tree-1.1.1/lib/red_black_tree.rb:491:in `values'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:35:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:34:in `synchronize'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:34:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/histogram.rb:46:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/timer.rb:54:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:97:in `write_metric'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:71:in `write'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/registry.rb:58:in `each'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/registry.rb:58:in `each'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:49:in `write'
eric commented 12 years ago

I'll see what the fix is in the avl_tree gem.

mperham commented 12 years ago

Any progress? We are seeing this every minute in Airbrake.

eric commented 12 years ago

I've created a pull request for the underlying bug here: https://github.com/nahi/avl_tree/pull/3

I just pushed 0.9.7.1 with a workaround until the avl_tree gem has been released with the fix.

mperham commented 12 years ago

We pushed 0971 to production and are still seeing the exact same error.

eric commented 12 years ago

Can you paste the stack trace?

mperham commented 12 years ago

Oops, my mistake. Passenger didn't shut down properly on one of our app servers.

eric commented 12 years ago

I just released v0.9.7.2 that depends on the fixed version of avl_tree. There should be no difference in functionality from v0.9.7.1.

Let me know if you run into any more issues.

eric commented 12 years ago

Any news on the latest version? Have the problems cleared up?

mperham commented 12 years ago

Yes, 0.9.7.1 is working great for us.

eric commented 12 years ago

Glad to hear it.