gemhome / fnordmetric

(possible new home for) FnordMetric is a redis/ruby-based realtime Event-Tracking app
0 stars 1 forks source link

[QUESTION] last 5 and last 20 empty #34

Open bf4 opened 10 years ago

bf4 commented 10 years ago

Issue by osallou Friday Nov 02, 2012 at 12:55 GMT Originally opened as https://github.com/paulasmuth/fnordmetric/issues/106


Hi, I am testing the tool and sent some data in my app. My timeseries graph show correct number of jobs, but I have:

Total (1 day) 3.5k+249.0 Last 5 min 0.000 Last 20 min 0.000 Last hour 3.5k+249.0

Why do I get Last 5min and 20min empty?

My timeseries is defined as:

Number of jobs, resolution 5 minutes

timeseries_gauge :number_of_jobs, :group => "jobs", :title => "Number of jobs", :key_nouns => ["job"], :series => [:jobs], :resolution => 5.minutes

Thanks

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 00:39 GMT


I'm seeing the same behavior (see my post on Google Groups). For me, I'd sometimes see data for last 5, last 15, and last 45, but nothing in last 10, last 30, or last 1 hours. I have a feeling there is a bug in the timeseries code that loops through the resolution to group the events.

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 02:04 GMT


FYI, I've been doing some more testing, and I'm starting to think it has to do with the flush_interval, though I'm still somewhat baffled as to what I'm seeing.

I was logging some data, and here's what I see for each of the different selected timeframes.

21:36

last 10 min: screen shot 2013-08-03 at 9 36 15 pm

last 15: screen shot 2013-08-03 at 9 36 32 pm

last 30: screen shot 2013-08-03 at 9 36 48 pm

last 45: screen shot 2013-08-03 at 9 36 58 pm

last hour: screen shot 2013-08-03 at 9 37 15 pm

Then, I wait a little longer and log some more data, and I get...

21:49

last 10 min: screen shot 2013-08-03 at 9 49 35 pm

last 15: screen shot 2013-08-03 at 9 49 45 pm

last 30: screen shot 2013-08-03 at 9 49 52 pm

last 45: screen shot 2013-08-03 at 9 50 02 pm

And then last hour, 1.5hr, etc. all show what you'd expect based on the "last 45" graph.

But then without logging anything else, I wait a couple minutes, and suddenly all the graphs are showing correctly...

21:50 (same data as 21:49)

last 10 min: screen shot 2013-08-03 at 9 50 57 pm

last 15: screen shot 2013-08-03 at 9 51 07 pm

last 30: screen shot 2013-08-03 at 9 51 18 pm

last 45: screen shot 2013-08-03 at 9 51 32 pm

And the rest all look right too. Because of the fact that the data seems to have corrected after waiting a couple minutes, I'm wondering if this is related to the flush interval (which is the default for this gauge), or the tick (which is 60 for this gauge).

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 02:14 GMT


Of course, now I just checked the gauge after 10 more minutes, and it's completely blank :frowning:

22:11 (same data as before)

last 1 hour: screen shot 2013-08-03 at 10 11 19 pm

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 07:59 GMT


I think I've figured out why the timeseries data goes to zero after a while.

In tracing it through, it looks like the timeseries data is being populated from the FnordMetric::GaugeCalculations#fraction_values_in method. If you print out the retention_key(_tick, _append) from that line, you can see the keyspace that it's looking up in Redis.

Those values in redis are incremented in the FnordMetric::GaugeModifiers#incr_fraction method. At the end of that method, it seems to be setting it to expire in Redis at retention, which is defined in FnordMetric::Gauge to be simply tick*10.

Sure enough, if I set the tick option in my timeseries gauge to be 1 second, then I can refresh the graph for 10 seconds and then the timeseries graph goes blank. Before, I had tick set to 60 seconds. So, that explains why my data disappeared after 10 minutes.

I have to say, at this point, I feel pretty stumped. I would have thought that the data should be set to expire according to FnordMetric::DEFAULT_OPTIONS[:event_data_ttl], which is 30 days be default.

I don't get what retention is supposed to be either, with it being tick*10.

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 08:40 GMT


I think I've also figured out why going from one range to the next sometimes excludes data. I updated my FnordMetric::GaugeCalculations#fraction_values_in method with some debugging code as this:

  def fraction_values_in(range, _append=nil)
    Hash.new{ |h,k| h[k] = [0,0] }.tap do |vals|
      allticks = ticks_in(range, retention)
      puts "Ticks range: #{allticks.min} - #{allticks.max}"
      ticks_in(range, retention).each do |_tick|
        puts "retention key: #{retention_key(_tick, _append)}"
        sync_redis.hgetall(retention_key(_tick, _append)).each do |k, v|
          puts "k: #{k}, v: #{v}"
          kx = k.split("-")
          vals[kx.first.to_i][kx.last == "denominator" ? 1 : 0] += v.to_f
        end
      end
    end
  end

Now, when I click "last 10 minutes", I see this:

append: all_events
Ticks range: 1375602600 - 1375603200
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7

Then when I click "last 15 minutes", I see:

append: all_events
Ticks range: 1375602600 - 1375603800
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603800-all_events
k: 1375604160-numerator, v: 5

I may be wrong, but shouldn't going from last 10 minutes to last 15 minutes make the time range go back 5 more minutes, not forward?

Then I go up to "last 30 minutes", and I get:

append: all_events
Ticks range: 1375601400 - 1375603200
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375601400-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602000-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7

And then "last 45 minutes" and the same thing happens again:

append: all_events
Ticks range: 1375600800 - 1375603800
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375600800-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375601400-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602000-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375602600-all_events
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603200-all_events
k: 1375603680-numerator, v: 1
k: 1375603740-numerator, v: 7
retention key: fnordmetric-mysite-gauge-events_per_minute-60-1375603800-all_events
k: 1375604160-numerator, v: 5

In other words, it's 4:29, and...

selectinggives range
last 10 minutes4:00-4:10
last 15 minutes4:00-4:20
last 30 minutes3:40-4:10
last 45 minutes3:30-4:20

So, the problem is that "last 10 minutes" actually selects 10 minutes starting from 15 minutes ago, and "last 30 minutes" actually selects 30 minutes starting from 45 minutes ago, rather than selecting the last 10 or 30 minutes, respectively.

bf4 commented 10 years ago

Comment by JangoSteve Sunday Aug 04, 2013 at 08:49 GMT


Now that I've gone through all of this, I'm not actually sure this is the same issue you're having. Sorry for hi-jacking your issue if that's the case. I've opened two new issues in #155 and #156.