bradfitz / gomemcache

Go Memcached client library #golang
Apache License 2.0
1.76k stars 460 forks source link

Add a rate-limiter to dial() function? #115

Open bboreham opened 4 years ago

bboreham commented 4 years ago

Every few days, one of my servers issues a kernel log message:

TCP: request_sock_TCP: Possible SYN flooding on port 11211. Sending cookies.  Check SNMP counters.

Mostly processing continues after this, but sometimes the entire server is unresponsive for minutes.

In this environment we have 10 Go programs using gomemcache hitting one memcached server, and each Go program has 60 goroutines that will call through this library. So I expected a maximum of 600 connections at a time.

I have seen the SYN flooding message at the default memcached connection backlog of 1024, and also after I raised it to 4096.

From inspection of logs, packet traces, etc., I have formed the impression that some glitch in processing or network causes timeout errors (at the default of 100ms), which then cause gomemcache to dial new connections. 60 goroutines waiting 100ms each to dial gives 600 new connections dialed each second, per process.

If the dial attempts are not being discarded on the other end of the wire, then I think it can quickly go over the backlog limit.

I wondered if gomemcache should have a rate-limiter on dial()? I would prefer gomemcache to fail quickly rather than raising the timeout to slow it down. Any other insight would be valued.

The only related issue I could see here is #108 ; interestingly we are both running the same system.

bboreham commented 4 years ago

86 could be used to add a rate-limiter from the outside. Or a circuit-breaker, which is perhaps an even better idea.

bboreham commented 3 years ago

Update: I added a circuit-breaker using the code from #86, and the symptoms went away. I would like to see #86 merged.