cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.91k stars 3.78k forks source link

server: Drain operation failed and drained nodes not accepting sql connections, but is treated as active for other operations #130853

Open csgourav opened 5 days ago

csgourav commented 5 days ago

This issue started on drt-ldr on September 11, 16:51, a node-kill/sigkill/drain=true operation was started which does the following two things:

1. Drain Node
./cockroach node drain 
2. Kill cockroach process with
kill -9 <crdb process id>

The operation failed in the middle of drain operation and could not run the operation cleanup step which resulted in server not accepting sql client connection datadog logs for operation failure

Sep 11 16:51:55.331 drt-ldr1-0003 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s Sep 11 17:27:13.272 drt-ldr1-0001 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s Two nodes were affected drt-ldr1-0001 and drt-ldr1-0003 which were not accepting sql clients. Nodes drt-ldr1-0002, drt-ldr1-0004, drt-ldr1-0005 are working and accepting client connections.

More details in slack thread [link]

Jira issue: CRDB-42266

blathers-crl[bot] commented 5 days ago

Hi @csgourav, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.