HubSpot / BuckyServer

Node server that receives metric data over HTTP & forwards to your service of choice
http://github.hubspot.com/bucky
MIT License
194 stars 33 forks source link

multi-metric packets #12

Closed earthgecko closed 10 years ago

earthgecko commented 10 years ago

We currently push a lot of things to statds with multi-metric packets

some.metric.total:33|c\nsome.metric.other.total:2|c\nsome.metric.more.total:77|c\nsome.metric.summore.total:77|c

I have confirmed with tcpdump that multi-metric submissions are not making it through to statsd but single metrics are. Is there any formatting or way to get multi-metric submission through the BuckyServer app and forwarded on to statds?

zackbloom commented 10 years ago

The default statsd collector just passes the body directly to the statsd client: https://github.com/HubSpot/BuckyServer/blob/master/modules/statsd.coffee

Should it be splitting the input first?

earthgecko commented 10 years ago

Now there is a question :)

However from the statsd tcpdump maybe?? I am not sure

As I always say, "u know ur deep in it when u r using tcpdump - deepinit"

Image of deepinit

So actually conisdering from the looks of tcpdump you may have to because each entry appears seems to be that that way in tcpdump output, and if you are debugging in tcpdump, u know ur ... :)

So tcpdump seems to send a new line for each \n

me@here ~] cat /tmp/tcpdump.8125.log | grep "varnish.rpm.total" | tail
E...7/@.8.K^PUWE_..(.(......varnish.rpm.total:9882|c
E....P@.8..:PUVD_..(.8....n.varnish.rpm.total:9865|c
E....u@.9.L.U..N_..(........varnish.rpm.total:8720|c
E....c@.;..:...._..(.......@varnish.rpm.total:29309|c
.i_..(.8....u/varnish.rpm.total:29563|c
E.....@.;.."...E_..(........varnish.rpm.total:34810|c
.varnish.rpm.total:30836|c
varnish.rpm.total:30461|c
E....q@.;.z...9G_..(.-......varnish.rpm.total:34288|c
E...,@@.;......]_..(......~.varnish.rpm.total:30207|c
[me@here ~]

Considering these packets were sent via nc direct to statsd on udp in a multi-metric packet, they do appear to new lines in tcpdump per metric in the multi-metric packet.

[me@here ~]
[me@here ~] cat /tmp/tcpdump.8125.log | grep -B1 -A1 "varnish.rpm.total" | tail -n 20
--
17:00:10.673706 IP 123.124.125.9.56289 > a-statsd-server.8125: UDP, length 250
E.....@.;.."...E_..(........varnish.rpm.total:34810|c
varnish.rpm.weeds.someweed:5|c
--
E....L@.;..
.varnish.rpm.total:30836|c
varnish.rpm.weeds.someweed:2|c
--
E...b|@.;.      ...3._..(.}....A
varnish.rpm.total:30461|c
varnish.rpm.weeds.someweed:1|c
--
17:00:11.185090 IP 634.213.57.711.48429 > a-statsd-server.8125: UDP, length 250
E....q@.;.z...9G_..(.-......varnish.rpm.total:34288|c
varnish.rpm.weeds.someweed:1|c
--
17:00:11.330388 IP 123.124.125.18.59872 > a-statsd-server.8125: UDP, length 250
E...,@@.;......]_..(......~.varnish.rpm.total:30207|c
varnish.rpm.weeds.someweed:2|c
[me@here ~]

However because I am not even seeing the data hit statds from BuckyServer I am thinking that BuckyServer is not sending it to statds in the first place.

I have verified a one to one test

multi-metric packet test

send multi-metric packet to Bucky confirmed reciept in tcpdump tcpdump confirms no data sent on to statds

single metric test

send single metric packet to Bucky confirmed reciept in tcpdump tcpdump confirms data sent on to statds

Easy workaround is to just go one request per metric, but we get loads of metrics and having all the metrics split into good multi-metric packets size really reduces the number of calls at scale if possible.

earthgecko commented 10 years ago

statsd multi-metric packets with BuckyServer

statds has the ability to receive multi-packet metrics (https://github.com/etsy/statsd/blob/master/docs/metric_types.md#multi-metric-packets)

gorets:1|c\nglork:320|ms\ngaugor:333|g\nuniques:765|s

In some other clients such as nc when you send this directly to statsd, you send that string as is with \n, not with BuckyServer and this can be confusing.

With BuckyServer the multi-metric packets udp rules do not apply, there are no such limits via TCP.

BuckyServer gives you the ability to ship valuable statsd metrics more reliably longhual over TCP, rather than longhual over UDP. However this does have a price, you lose non-blocking fire and forget, but it does have a number of advantages too.

It is very important from the client side to be able to ship as many metrics per call as possible, with statds the packet size is limited by "total length of the payload within your network's MTU". There you may have to split your metrics into string < max_paylod_bytes and loop through with nc hits to statsd many times (fire and forget). With the HTTP client to BuckyServer can would send all of those metrics in one POST with no \n

This is very advantageous if you are sending lots of critical long namespace metrics to statsd, especially from distributed, longhaul geographic regions.

BuckyServer simply submits each metric to statsd, it does not submit them to statsd as multi-metrics packets (so more UDP connections local, but reliable UDP).

With the HTTP client the post data must not have the \n but rather a metric per line.

Good POST data

POST_DATA="test.bucky.alive:1|g
test.bucky.does_multi_metric_packets:1|c"

This example below would be submited to statsd, but it would NOT make it into statsd, meaning to would not be forwarded on to graphite, et al.

BAD POST data \n

BAD_POST_DATA="test.bucky.alive:1|g\ntest.bucky.does_multi_metric_packets:1|c"

HTTP client testing on BuckyServer and statsd

We can confirm that BuckyServer receives the POST with the multi-metric packet and that statsd receives all the metrics with tcpdump.

On the BuckyServer start tcpdump

STATSD_PORT=8125
BUCKYSERVERPORT="your.buckyserver.port"
tcpdump -i any port $STATSD_PORT -A > /tmp/tcpdump.$STATSD_PORT.log &
tcpdump -i any port $BUCKYSERVERPORT -A > /tmp/tcpdump.$BUCKYSERVERPORT.log &

With an HTTP client like wget (or curl) to make it "non blocking" do not forget timeouts (no curl timeout was added in example)

BUCKYSERVER="your.buckyserver.ipaddress"
BUCKYSERVERPORT="your.buckyserver.port"
POST_DATA="test.bucky.alive:1|g
test.bucky.does_multi_metric_packets:1|c"
wget  --tries=1 --timeout=1 --dns-timeout=1 --post-data="$POST_DATA" --header="Content-Type: text/plain" http://$BUCKYSERVER:$BUCKYSERVERPORT/bucky/v1/send
# or with curl
# curl -X POST -H "Content-Type: text/plain" -d "$POST_DATA" http://$BUCKYSERVER:$BUCKYSERVERPORT/bucky/v1/send

On the BuckyServer kill tcpdump

kill `pidof tcpdump`

We can see that BuckyServer got the POST and statsd got all the multi-packet metrics (albiet via more udp connections), but you can now reliably forward statsd metrics longhaul via TCP.

cat /tmp/tcpdump.$BUCKYSERVERPORT.log | grep POST
cat /tmp/tcpdump.$STATSD_PORT.log | grep "bucky"

multi-metric stats shipped via tcp to statsd

[me@here ~] kill `pidof tcpdump`
11 packets captured
13 packets received by filter
0 packets dropped by kernel
[me@here ~] 144 packets captured
146 packets received by filter
0 packets dropped by kernel

[1]-  Done                    tcpdump -i any port 8125 -A > /tmp/tcpdump.8125.log
[2]+  Done                    tcpdump -i any port 8080 -A > /tmp/tcpdump.8080.log
[me@here ~] cat /tmp/tcpdump.8080.log | grep POST
..e.x..POST /bucky/v1/send HTTP/1.0
Access-Control-Allow-Methods: POST
[me@here ~]
[me@here ~] cat /tmp/tcpdump.8125.log | grep "bucky"
E.....@.@.<h.............o..test.bucky.alive:1|g
test.bucky.does_multi_metric_packets:1|c
[me@here ~]

The correct metrics and values are relayed to graphite and our valuable metrics have TCP transport

earthgecko commented 10 years ago

go BuckyServer you are my new favorite tool - reliable tcp transport for statds metrics - thanks guys!

etsy's tcp server type is not quite there yet, but BuckyServer sure is and the best thing is we can now test whether our valuable metrics were shipped longhual rather than fired and forgotten :) Because we can test the exit code AND and another really sweet thing is, if for some reason BuckyServer failed, we can just failover to nc'ing direct udp to statsd. Makes me smile. Nice way to end the week.