bitly / statsdaemon

an implementation of Etsy's statsd in Go
The Unlicense
570 stars 131 forks source link

Losing data when getting large UDP packets from munin to statsd #34

Closed jeffsilverman-1958 closed 9 years ago

jeffsilverman-1958 commented 9 years ago

I am converting an existing installation of munin to an installation that uses munin plugins with statsd reporting. To do this, I am using munin-statsd.pl to query the munin plugins on the $muninhost port 4949. munin-statsd.pl runs out of cron once a minute, forms a connection to a daemon port TCP/4949 on the machine being monitored. The daemon has a list of plugins that it executes in sequence. I know from my existing munin installation that this daemon and the plugins works properly.

I do not see the data items in the web browser. I don't see any errors in /opt/graphite/storage/log/webapp/error.log

I ran wireshark and looked at the data going in to statsdaemon on port UDP/8125 and the data going out of statsdaemon to graphite port TCP/2003. I see typically 33500 byte UDP packets (I didn't know that a UDP packet could be that big but since the munin-statsd.pl program and the statsdaemon are on the same machine, I assume that the network can handle it.) coming in. I see that the statsdaemon make a connection to port TCP/2003 every 10 seconds, and then close the TCP connection immediately 5 out of 6 times. On the 6th time, it transfers a 586 byte TCP packet, which is ACK'd, and then the connection closes normally.

I must be losing a great deal of information because I am going from a UDP packet tens of thousands of bytes long to a TCP packet of a couple of hundred bytes longs.

I can send you tcpdump packet captures if that would be helpful.

In looking over what I have written here, I speculate that there may be a design error in munin-statsd.pl in that sends a single large UDP packet instead of a bunch of small UDP packets. Does that make sense?

Thank you and have a nice weekend.

Jeff

mreiferson commented 9 years ago

@jeffsilverman-1958 statsdaemon will only accept a max UDP packet size of 512 bytes, which explains this behavior.

If you can refactor your script to break apart the metrics into smaller chunks, I expect that things will work fine.

P.S. We should probably document this better :smile: