DataDog / dogstatsd-ruby

A Ruby client for DogStatsd
https://www.datadoghq.com/
MIT License
179 stars 137 forks source link

Modify #time to support a "count" option. #48

Open pnomolos opened 7 years ago

pnomolos commented 7 years ago

I have to time some bulk operations and it's not viable to push to statsd every iteration of the loop. However, what would be nice is something like the following:

$statsd.time('accounts.activate', { count: accounts.length } ) { process_accounts(accounts) }

Internally this would change time to something similar to the following:

def time(stat, opts={})
  count = [opts.fetch(:count, 1), 1].max
  start = Time.now
  return yield
ensure
  time_since(stat / count, start, opts)
end

I can open a PR if this sounds like a good idea :)

degemer commented 7 years ago

Hello @pnomolos, thanks for the suggestion!

The main/only downside of this approach is that we lose granularity, since time calls are aggregated as an histogram in dogstatsd (with avg, median, percentiles), so we would send pre-aggregated data to dogstatsd.

Would the batch function work for you? It should greatly reduce the number of push to statsd while keeping the full data.

What do you think?

pnomolos commented 7 years ago

@degemer Unfortunately that isn't viable in my use-case. In the example above, process_accounts(accounts) is a call out to a third-party library, but the processing time should scale linearly with the number of accounts.

In my case I'm running about 600 of these jobs per day (with varying numbers of accounts per job) and I'm trying to get a baseline for the time it takes to do the jobs, normalized by the number of accounts that are being processed.

If I was able to add a call per-account I definitely would, but that's not the case here :(