Closed mperham closed 12 years ago
What version of ruby are you using?
Latest REE 1.8.7.
Are you calling Timer#time()
or Timer#update()
to record the timing?
I don't have the code in front of me but the backtrace says:
[GEM_ROOT]/gems/metriks-0.9.3/lib/metriks/timer.rb:49:in `time'
Oh, good point. Sorry I missed that.
I'm investigating.
Looks like it might be a NaN issue in your calcs:
http://www.stephensykes.com/blog_perm.html?152
[0.0, 1/0.0].sort
=> [0.0, Infinity]
[0.0, 0/0.0].sort
ArgumentError: comparison of Float with Float failed
Thanks to Evan Weaver for pointing this out to me. This is due to 0/0.0 returning the value NaN (as opposed to 1/0.0 which returns Infinity). And the comparison with NaN by sort gives the strange error message.
Very interesting...
Does this happen all the time or did it just happen once?
It's happened just once.
I had a theory that rand
on exponentially_decaying_sample.rb:41
was returning 0.0
and causing a problem:
priority = weight(timestamp - @start_time) / rand
but I've tried mocking rand
to return 0.0
and it didn't raise when the rescale
happened.
Still looking...
It isn't quite as elegant (I should fix that), but if you would like to use a Timer that isn't using the ExponentiallyDecayingSample
until we can get to the bottom of it, you can create a timer with:
Metriks::Registry.default.add('redis.timer.name', Metriks::Timer.new(Metriks::Histogram.new_uniform))
and can access it with:
Metriks::Registry.default.get('redis.timer.name')
or
Metriks.get('redis.timer.name')
It just happened a second time. Is it possible that it's a thread-safety issue?
I definitely don't see where the unsynchronized access could be happening...
I'm still brainstorming how this could be happening.
I've just released 0.9.5 with a guard against a priority of NaN in 28996f6. Let me know if you have a chance to try it and if you get any of the warns.
Successfully registered gem: metriks (0.9.5)
Just happened again:
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:83:in `[]='
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:83:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:81:in `each'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:81:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:78:in `synchronize'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:78:in `rescale'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/exponentially_decaying_sample.rb:68:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/histogram.rb:38:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:34:in `update'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:17:in `stop'
[GEM_ROOT]/gems/metriks-0.9.5/lib/metriks/timer.rb:49:in `time'
config/initializers/redis.rb:15:in `call'
[GEM_ROOT]/gems/redis-2.2.2/lib/redis.rb:681:in `del'
You can also get this message if spaceship returns nil:
irb(main):029:0> class Float
irb(main):030:1> def <=>(a)
irb(main):031:2> end
irb(main):032:1> end
=> nil
irb(main):033:0> [1.0, 2.0].sort
ArgumentError: comparison of Float with Float failed
from (irb):33:in `sort'
from (irb):33
from /opt/local/bin/irb:12:in `<main>'
Because of:
VALUE
rb_num_coerce_relop(VALUE x, VALUE y, ID func)
{
VALUE c, x0 = x, y0 = y;
if (!do_coerce(&x, &y, FALSE) ||
NIL_P(c = rb_funcall(x, func, 1, y))) {
rb_cmperr(x0, y0);
return Qnil; /* not reached */
}
return c;
}
I'm going to investigate (benchmark) using avl_tree instead of rbtree to see if it clears up these issues.
I'll let you know how it progresses.
I've just released 0.9.6 that uses avl_tree
instead of rbtree
.
Let me know if it works any better.
Now we see this error:
NoMethodError: undefined method `values' for #<RedBlackTree::Node::EmptyNode:0x4e35e78>
[GEM_ROOT]/gems/avl_tree-1.1.1/lib/red_black_tree.rb:491:in `values'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:35:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:34:in `synchronize'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/exponentially_decaying_sample.rb:34:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/histogram.rb:46:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/timer.rb:54:in `snapshot'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:97:in `write_metric'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:71:in `write'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/registry.rb:58:in `each'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/registry.rb:58:in `each'
[GEM_ROOT]/gems/metriks-0.9.7/lib/metriks/reporter/graphite.rb:49:in `write'
I'll see what the fix is in the avl_tree gem.
Any progress? We are seeing this every minute in Airbrake.
I've created a pull request for the underlying bug here: https://github.com/nahi/avl_tree/pull/3
I just pushed 0.9.7.1 with a workaround until the avl_tree gem has been released with the fix.
We pushed 0971 to production and are still seeing the exact same error.
Can you paste the stack trace?
Oops, my mistake. Passenger didn't shut down properly on one of our app servers.
I just released v0.9.7.2 that depends on the fixed version of avl_tree. There should be no difference in functionality from v0.9.7.1.
Let me know if you run into any more issues.
Any news on the latest version? Have the problems cleared up?
Yes, 0.9.7.1 is working great for us.
Glad to hear it.
I have no idea what this error means. :-( We've monkeypatched the redis gem to time the redis operations we do, in this case a del operation timing update failed.