go-graphite / carbonzipper

proxy to transparently merge graphite carbon backends
Other
104 stars 29 forks source link

weird outage this morning #52

Open blysik opened 6 years ago

blysik commented 6 years ago

I am running is a slightly weird config while I try to migrate:

carbon-c-relay going to two separate clusters: a) 3 hosts in carbon_ch hash, go-carbon/carbonserver b) 4 hosts in jump fnva1 hash, go-carbon/carbonserver

carbonzipper is setup to read from all 7 host backends.

This morning, one of the jump_fnva1 machines went offline. At that time, I started to see really strange spikes of both write timeouts from carbon-c-relay, as well as carbonzipper reporting read timeouts.

See these graphs:

screen shot 2017-09-08 at 8 50 10 am

As soon as I go the go-carbon host back up, things went back to normal. But this behavior is strange, and not what I expected.

Does anyone know what could cause this?

blysik commented 6 years ago

The behavior of the offline host was odd. It was pinging, but ssh wouldn't connect. I wonder if it was still listening on the go-carbon/carbserver ports, but maybe it was a tcp timeout.

Could the dropped metrics and read timeouts be visualizing essentially tcp timeouts?