codership / glb

Galera Load Balancer - a simple TCP connection proxy and load-balancing library
GNU General Public License v2.0
154 stars 51 forks source link

Destination failover does not work on CentOS 6 #1

Closed ayurchen closed 10 years ago

ayurchen commented 10 years ago

If the first tried destination is unavailable, the connecting client gets hung. The following patch seems to get it working:

--- a/src/glb_pool.c
+++ b/src/glb_pool.c
@@ -123,10 +123,10 @@ typedef enum pool_fd_ops
 {
 #ifdef USE_EPOLL
     POOL_FD_READ  = EPOLLIN,
-    POOL_FD_WRITE = EPOLLOUT,
+    POOL_FD_WRITE = EPOLLOUT | EPOLLERR,
 #else /* POLL */
     POOL_FD_READ  = POLLIN,
-    POOL_FD_WRITE = POLLOUT,
+    POOL_FD_WRITE = POLLOUT | POLLERR,
 #endif /* POLL */
     POOL_FD_RW    = POOL_FD_READ | POOL_FD_WRITE
 } pool_fd_ops_t;

But debug build asserts:

   INFO: glb_pool.c:400: Pool 0: added connection, (total pool connections: 1)
  DEBUG: glb_pool.c:727: pool_handle_write() to server: 0
   INFO: glb_pool.c:685: Async connection to 10.21.32.1:3305 failed: 111 (Connection refused)
   INFO: glb_pool.c:697: Reconnecting to 10.21.32.1:3304
   INFO: glb_listener.c:100: Accepted connection from 10.21.32.1:52057 to 10.21.32.1:3305

   INFO: glb_pool.c:400: Pool 0: added connection, (total pool connections: 42949672961)
glbd: glb_pool.c:733: pool_handle_write: Assertion `dst->end != POOL_END_INCOMPLETE' failed.
   INFO: glb_signal.c:42: Received signal 6. Terminating.
Aborted

Interestingly, this (client hanging, not assert) happens on CentOS, but does not seem to happen on Ubuntu...

It looks like the process goes into tight loop here:

Thread 4 (Thread 0x7f59c0114700 (LWP 19458)):
#0  0x00007f59c01fdf43 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000040c901 in pool_fds_wait (pool=0x7f59c0a5c058) at glb_pool.c:260
#2  0x000000000040df81 in pool_thread (arg=0x7f59c0a5c058) at glb_pool.c:829
#3  0x00007f59c04af851 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f59c01fd94d in clone () from /lib64/libc.so.6
ayurchen commented 10 years ago

fixed in https://github.com/codership/glb/commit/f917e39ef6c4d1226cb899a4d4929d42e019b285