BIG-3419 - Send NSQ cluster-wide metrics to statsd

dlecocq commented 8 years ago

Given a nsqlookupd instance, discover all producers and for each producer, grab statistics about its topics and channels and aggregate them.

It reports metrics at several granularities. Specifically, metrics are reported for all unique and extant {host, topic} and {host, topic, channel} tuples, as well as aggregates metrics (total and max) for:

all channels across a topic on a host
all channels across a topic and all hosts
all channels across all topics and all hosts
a topic across all hosts
across all topics across all hosts

For example, consider that we find the following metrics on the my-topic topic and channels channel-one and channel-two across two hosts:

host.nsqd-one.topic.my-topic.channel.channel-one.depth = 8
host.nsqd-one.topic.my-topic.channel.channel-two.depth = 7
host.nsqd-two.topic.my-topic.channel.channel-one.depth = 5

We would then also report:

# The max and total depth of a channel on this topic on this host
host.nsqd-one.topic.my-topic.channels.depth.max = 8
host.nsqd-one.topic.my-topic.channels.depth.total = 15
host.nsqd-two.topic.my-topic.channels.depth.max = 5
host.nsqd-two.topic.my-topic.channels.depth.total = 5

# The max and total depth of this topic across all hosts
topic.my-topic.channels.depth.max = 8
topic.my-topic.channels.depth.total = 20

# The max and total depth of channels on all topics and all hosts
channels.depth.max = 8
channels.depth.total = 20

Similar aggregation happens at the topic level. Given these measurements:

host.nsqd-one.topic.topic-one.depth = 10
host.nsqd-two.topic.topic-one.depth = 3
host.nsqd-two.topic.topic-two.depth = 8

We would provide aggregates:

# The max and total depth of this topic across all hosts
topic.topic-one.depth.max = 10
topic.topic-one.depth.total = 13
topic.topic-two.depth.max = 8
topic.topic-two.depth.total = 8

# The max and total depth of all topics across all hosts
topics.depth.max = 10
topics.depth.total = 21

The main motivation for this aggregation is to know if any topic or channel has too great a depth without having to know which one. Particularly convenient for creating alarms when hosts and topics cannot be determined a priori.

@lindseyreno @b4hand @neilmb

tammybailey commented 8 years ago

Soooooo, with disclaimer: python n00b, LGTM.

mreiferson commented 8 years ago

@dlecocq I'm kind of confused - you realize that nsqd publishes statsd metrics out of the box, right?

http://nsq.io/components/nsqd.html#a-namestatsdstatsd--graphite-integrationa

dlecocq commented 8 years ago

Yep. But we don't have an intermediate layer for aggregation.

Out setup is such that we may be scaling up and down nsqd instances with some regularity and so we don't necessarily want to track at the host level, but do want to track specific topics or all channels across the cluster.

I'd be very happy for a cleaner alternative.

On Friday, April 22, 2016, Matt Reiferson notifications@github.com wrote:

@dlecocq https://github.com/dlecocq I'm kind of confused - you realize that nsqd publishes statsd metrics out of the box, right?

http://nsq.io/components/nsqd.html#a-namestatsdstatsd--graphite-integrationa

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/dlecocq/nsq-py/pull/41#issuecomment-213632336

Dan Lecocq

Software Engineer

w: moz.com

t: @danlecocq https://twitter.com/danlecocq

dlecocq / nsq-py

BIG-3419 - Send NSQ cluster-wide metrics to statsd #41