gemhome / fnordmetric

(possible new home for) FnordMetric is a redis/ruby-based realtime Event-Tracking app
0 stars 1 forks source link

Selecting "last 10" or "last 30" minutes in Timeseries gauge gives incorrect window of data #63

Open bf4 opened 10 years ago

bf4 commented 10 years ago

Issue by JangoSteve Sunday Aug 04, 2013 at 08:49 GMT Originally opened as https://github.com/paulasmuth/fnordmetric/issues/155


I think I figured out why going from one range to the next sometimes excludes data. I updated my FnordMetric::GaugeCalculations#fraction_values_in method with some debugging code as this:

  def fraction_values_in(range, _append=nil)
    Hash.new{ |h,k| h[k] = [0,0] }.tap do |vals|
      allticks = ticks_in(range, retention)
      puts "Ticks range: #{allticks.min} - #{allticks.max}"
      ticks_in(range, retention).each do |_tick|
        puts "retention key: #{retention_key(_tick, _append)}"
        sync_redis.hgetall(retention_key(_tick, _append)).each do |k, v|
          puts "k: #{k}, v: #{v}"
          kx = k.split("-")
          vals[kx.first.to_i][kx.last == "denominator" ? 1 : 0] += v.to_f
        end
      end
    end
  end

Now, when I click "last 10 minutes", I see this:

append: all_events
Ticks range: 1375602600 - 1375603200
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7

Then when I click "last 15 minutes", I see:

append: all_events
Ticks range: 1375602600 - 1375603800
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603800-all_events
k: 1375604160-numerator, v: 5

I may be wrong, but shouldn't going from last 10 minutes to last 15 minutes make the time range go back 5 more minutes, not forward?

Then I go up to "last 30 minutes", and I get:

append: all_events
Ticks range: 1375601400 - 1375603200
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375601400-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602000-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7

And then "last 45 minutes" and the same thing happens again:

append: all_events
Ticks range: 1375600800 - 1375603800
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375600800-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375601400-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602000-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603800-all_events
k: 1375604160-numerator, v: 5

In other words, it's 4:29, and...

selectinggives range
last 10 minutes4:00-4:10
last 15 minutes4:00-4:20
last 30 minutes3:40-4:10
last 45 minutes3:30-4:20

So, the problem is that "last 10 minutes" actually selects 10 minutes starting from 15 minutes ago, and "last 30 minutes" actually selects 30 minutes starting from 45 minutes ago, rather than selecting the last 10 or 30 minutes, respectively.

bf4 commented 10 years ago

Comment by JangoSteve Tuesday Aug 06, 2013 at 03:28 GMT


I finally figured out why the ranges are being selected as they are. It's from the GaugeCalculations#ticks_in method. It's doing something strange:

(((r.last-r.first)/_tick.to_f).ceil+1+overflow).times.map{ |n| tick_at(r.first + _tick*(n-1), _tick) }

The first part calculates the number of tick intervals contained in the current range, and the .ceil rounds up. Then it loops through that number of intervals from to call tick_at each interval within the range. But I think it's calling tick_at for the wrong values.

It should be calling tick_at for the range from r.first up to r.first + [number of tick intervals rounded up], but instead it's calling it from r.first - [1 tick interval] up to r.first + [one less than number of tick intervals rounded up].

This explains why it was working for the "last 15" and "last 45" minute intervals in my example above. Since tick was set to 60, then retention=600 (which is 10 minutes), meaning the selections in which retention divided unevenly would get rounded up by .ceil, causing ticks_in to loop one extra time, making up for the fact that it was starting the range loop one interval too early.

At first I thought it was intentionally starting at one interval less than the start of the range, to be sure we aren't leaving out values in the range. But this doesn't make since, because tick_at finds the closest tick at or below the value given, so there'd never be a need to go one interval lower, since tick_at will always find the floor to start with.

On the other hand, since we're starting from 0, we would need to add one to n in order to go up to the number of tick intervals in the loop (which explains why it's doing .ceil+1 before looping.

So, if we just change _tick*(n-1) to _tick*n, it solves the issue described in this ticket.

I'll see if I can create a test for this and submit a patch. Basically though, I had the intervals as such (with a tick of 60 seconds => retention of 600 seconds):

30 minutes:

actual range: 1375752990 to 1375754790 expected tick range: 1375752600 to 1375754400 calculated tick range: 1375752000 to 1375753800

45 minutes:

actual range: 1375752034 to 1375754734 expected tick range: 1375752000 to 1375754400 calculated tick range: 1375751400 to 1375754400

Making the above change fixes both of these cases, returning the expected range.