awesomized / libmemcached

Resurrection of libmemcached
https://awesomized.github.io/libmemcached/
BSD 3-Clause "New" or "Revised" License
45 stars 26 forks source link

[patch included] libmemcached exhibits odd behaviour on TCP connection close #5

Closed m6w6 closed 4 years ago

m6w6 commented 4 years ago

Imported from Launchpad using lp2gh.


I created a libmemcached client connection to a memcached server over TCP. This connection was long-lived and infrequently used. At some point, my memcached server was restarted, which caused it to close the libmemcached connection with a TCP FIN, which was ACKed.

A short while later (when the memcached server was back up and running), I tried to send a request over the libmemcached client connection. libmemcached tried to use the previous connection that had been closed with a FIN, which resulted in a TCP RST from the other end. This failure caused libmemcached to set a backoff timeout on the connection, so that even though the TCP FIN had arrived several minutes ago, the memcached server was running again and the reconnection attempt succeeded, libmemcached wouldn't let me make any more requests on that connection for a short while.

I also noticed that the behaviour varied depending on what type of request I was making - if I tried to store data in memcached, the error code was "MEMCACHED_CONNECTION_FAILED" (which was useful as I could simply notice that and retry the request) but when retrieving data from memcached the error code was "MEMCACHED_NOT_FOUND" which was much less useful.

The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:

I'm afraid my editor has stripped trailing spaces in a couple of places - hopefully that's not too confusing.

I have permission from my employer to contribute this back under libmemcached's 3-clause BSD license.

m6w6 commented 4 years ago

m6w6 commented 4 years ago

https://bugs.launchpad.net/libmemcached/+bug/1251482 seems to be fixing a similar issue, but I think my fix has the advantage that if a server is genuinely down (e.g. the attempt to reconnect over TCP fails), backoff logic will still kick in and protect it, rather than being bypassed entirely.

m6w6 commented 4 years ago

Investigating

On Jan 19, 2014, at 3:14 PM, Rob Day

https://bugs.launchpad.net/libmemcached/+bug/1251482 seems to be fixing a similar issue, but I think my fix has the advantage that if a server is genuinely down (e.g. the attempt to reconnect over TCP fails), backoff logic will still kick in and protect it, rather than being bypassed entirely.

-- You received this bug notification because you are subscribed to libmemcached. https://bugs.launchpad.net/bugs/1270651

Title: [patch included] libmemcached exhibits odd behaviour on TCP connection close

Status in libmemcached - A C and C++ client library for memcached: New

Bug description: I created a libmemcached client connection to a memcached server over TCP. This connection was long-lived and infrequently used. At some point, my memcached server was restarted, which caused it to close the libmemcached connection with a TCP FIN, which was ACKed.

A short while later (when the memcached server was back up and running), I tried to send a request over the libmemcached client connection. libmemcached tried to use the previous connection that had been closed with a FIN, which resulted in a TCP RST from the other end. This failure caused libmemcached to set a backoff timeout on the connection, so that even though the TCP FIN had arrived several minutes ago, the memcached server was running again and the reconnection attempt succeeded, libmemcached wouldn't let me make any more requests on that connection for a short while.

I also noticed that the behaviour varied depending on what type of request I was making - if I tried to store data in memcached, the error code was "MEMCACHED_CONNECTION_FAILED" (which was useful as I could simply notice that and retry the request) but when retrieving data from memcached the error code was "MEMCACHED_NOT_FOUND" which was much less useful.

The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:

  • for TCP connections, an failure to send data will never set a backoff timeout on the server - only a failure to connect will do that. This protects against the case where a remote server drops the connection, but this is not noticed for a short while, during which the server recovers. The request to send data will still fail, but if the request is retried, libmemcached will successfully re-establish the connection and send the data rather than rejecting the request due to backoff logic. (The behaviour for UDP is unchanged.)

  • for get/gets requests, if there is no MEMCACHED_SUCCESS result and one or more servers returned a MEMCACHED_CONNECTION_FAILED error, the return code will be MEMCACHED_CONNECTION_FAILED rather than MEMCACHED_NOT_FOUND. This lets the client make informed decisions about whether to retry.

    I'm afraid my editor has stripped trailing spaces in a couple of places - hopefully that's not too confusing.

    I have permission from my employer to contribute this back under libmemcached's 3-clause BSD license.

To manage notifications about this bug go to: https://bugs.launchpad.net/libmemcached/+bug/1270651/+subscriptions

m6w6 commented 4 years ago

Patch got applied in 78ab72c934e80631b107026ff3bd7cabe305d38d