cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.04k stars 3.8k forks source link

stability: requests sent via haproxy can return EOF if a node is down? #14071

Closed petermattis closed 7 years ago

petermattis commented 7 years ago

In a 6-node cluster, when one of the nodes is down, when a client connects to haproxy after the cluster being started I periodically see the SQL connection being closed. I'm speculating that this is because haproxy tried to connect to the down node. This is using the following haproxy config:

global
  maxconn 4096
  pidfile /tmp/cockroach-loadbalancer.pid

defaults
    mode                tcp
    log                 global
    option              dontlognull
    retries             2
    timeout connect     10s
    timeout client      5s
    timeout server      2m
    maxconn             4096

listen psql
    bind 0.0.0.0:27183
    mode tcp
    balance roundrobin
    server cockroach1 cockroach-denim-0001:26257
    server cockroach2 cockroach-denim-0002:26257
    server cockroach3 cockroach-denim-0003:26257
    server cockroach4 cockroach-denim-0004:26257
    server cockroach5 cockroach-denim-0005:26257
    server cockroach6 cockroach-denim-0006:26257

When opening a new connection to haproxy, I would expect it to retry a backend connection if no bytes have been sent on it. Perhaps I'm misunderstanding what is happening. When all 6 nodes are up, the problem does not occur. Are we missing some haproxy configuration?

mberhault commented 7 years ago

I assume this is due to the lack of health check in the config. Without it, haproxy probably considers all nodes to be up