feat: support retry on broken pipe

rad-pat commented 1 month ago

We are experiencing many issues calling Databend through the python driver because we have Databend in Kubernetes on spot nodes. The spot nodes can be reaped at any point and when that happens, we get errors such as:

APIError: ResponseError with 1067: transport error, source: Some(tonic::transport::Error(Transport, hyper::Error(Io, Custom { kind: BrokenPipe, error: "stream closed because of a broken pipe" })))

Can we have an option to retry on such errors at the driver level? Possibly even sent in as a config param? databend://u:p@host/db?retry_on_broken_pipe=3

everpcpc commented 1 month ago

This is not a problem within server and client, but an error between servers in the cluster. It's not safe to simply retry this error with client, since we have no idea about the current query.

Maybe we could retry at server level? cc @zhang2014

zhang2014 commented 1 month ago

When only network failures occur and the server is still available, it is safe to retry between nodes. However, if the instance has already been killed, it is not possible to retry.

databendlabs / bendsql

feat: support retry on broken pipe #481