This issue started on drt-ldr on September 11, 16:51, a node-kill/sigkill/drain=true operation was started which does the following two things:
1. Drain Node
./cockroach node drain
2. Kill cockroach process with
kill -9 <crdb process id>
The operation failed in the middle of drain operation and could not run the operation cleanup step which resulted in server not accepting sql client connection datadog logs for operation failure
Sep 11 16:51:55.331 drt-ldr1-0003 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s Sep 11 17:27:13.272 drt-ldr1-0001 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s
Two nodes were affected drt-ldr1-0001 and drt-ldr1-0003 which were not accepting sql clients. Nodes drt-ldr1-0002, drt-ldr1-0004, drt-ldr1-0005 are working and accepting client connections.
This issue started on drt-ldr on September 11, 16:51, a node-kill/sigkill/drain=true operation was started which does the following two things:
The operation failed in the middle of drain operation and could not run the operation cleanup step which resulted in server not accepting sql client connection datadog logs for operation failure
Sep 11 16:51:55.331 drt-ldr1-0003 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s
Sep 11 17:27:13.272 drt-ldr1-0001 drt-cockroachdb drain failed: some sessions did not respond to cancellation within 1s
Two nodes were affected drt-ldr1-0001 and drt-ldr1-0003 which were not accepting sql clients. Nodes drt-ldr1-0002, drt-ldr1-0004, drt-ldr1-0005 are working and accepting client connections.More details in slack thread [link]
Jira issue: CRDB-42266