bookingcom / nanotube

High-performance router for Graphite.
Apache License 2.0
56 stars 12 forks source link

write to a broken pipe #19

Closed azhiltsov closed 4 years ago

azhiltsov commented 4 years ago

submitting traffic to low traffic destination requires to open a connection, however it seems that nanotube is assuming that connection is already open and losing datapoints by writing to a broken pipe.

# netstat -antpu | grep 10.10.10.20

# telnet 127.0.0.1 2003
blah.blah 1 1581552000

logs:
Feb 13 16:56:16 relay-1002.com nanotube[14925]:
{"level":"warn","ts":1581609376.1698446,"caller":"target/host.go:125","msg":"error sending value to host. Reconnect and retry..","target":"10.10.10.20","port":2003,"error":"write tcp 10.10.10.10:50482->10.10.10.20:2003: write: broken pipe"}

# tcpdump -i any host 10.10.10.20 
is empty
grzkv commented 4 years ago

Can you please give a little bit more explanation on this?

So, the problem manifests as follows:

  1. Start Nanotube.
  2. Send a single point.
  3. The point is not sent because the connection is assumed but it is not there. This happens while a remote host is available and can be connected to.

Right?

grzkv commented 4 years ago

Here's a test trying to replicate this. Everything works if Nanotube is started from scratch with the default config. See the screenshot below.

nanotube_single_point

A single point is sent and it arrives at both clusters.

Probably the problem happens only when Nanotube is in some specific state.

azhiltsov commented 4 years ago

if you run go-carbon on the destination 10.10.10.20 then it will break connection after ~60 sec itself as it currently ignores keep-alive we are sending. After that you can retry sending metric from nanotube and should get a broken pipe, as nanotube does not notice that connection loss.

grzkv commented 4 years ago

Thanks, I will try that.