awesomized / libmemcached

Resurrection of libmemcached
https://awesomized.github.io/libmemcached/
BSD 3-Clause "New" or "Revised" License
45 stars 26 forks source link

More specific error messages when connect() fails #58

Closed m6w6 closed 4 years ago

m6w6 commented 4 years ago

Imported from Launchpad using lp2gh.


We've been logging intermittent connection failures for a while. It took a while to work out that it was probably due to local (ephemeral) port exhaustion. This causes connect() to fail with EADDRINUSE. We can easily reproduce connection failures in the libmemcached under realistic connection rates. On Linux, the error will occur when there are more than about 28232 connections from a single client host to a single server in a 60 second period (the TIME_WAIT expiry).

In libmemcached's network_connect(), EADDRINUSE is handled by the "default" case, so just gives MEMCACHED_CONNECTION_FAILURE with no other details. It would be nice if more information could be given, for the purposes of logging. MEMCACHED_CONNECTION_FAILURE could be split, or memcached_last_error_message() could be documented as a public API and populated with some errno-specific error message in the event of connect() failure.

It would be nice if any errno was handled, since EACCES, ENETUNREACH and ENOMEM are probably also possible.

Also, according to Linux's man connect(2), EAGAIN indicates "no more free local ports or insufficient entries in the routing cache", so you probably shouldn't call poll() on the FD if that happens.