databendlabs / bendsql

Databend Native Client
Apache License 2.0
50 stars 26 forks source link

feat: support retry on broken pipe #481

Open rad-pat opened 5 days ago

rad-pat commented 5 days ago

We are experiencing many issues calling Databend through the python driver because we have Databend in Kubernetes on spot nodes. The spot nodes can be reaped at any point and when that happens, we get errors such as:

APIError: ResponseError with 1067: transport error, source: Some(tonic::transport::Error(Transport, hyper::Error(Io, Custom { kind: BrokenPipe, error: "stream closed because of a broken pipe" })))

Can we have an option to retry on such errors at the driver level? Possibly even sent in as a config param? databend://u:p@host/db?retry_on_broken_pipe=3

everpcpc commented 4 days ago

This is not a problem within server and client, but an error between servers in the cluster. It's not safe to simply retry this error with client, since we have no idea about the current query.

Maybe we could retry at server level? cc @zhang2014

zhang2014 commented 3 days ago

When only network failures occur and the server is still available, it is safe to retry between nodes. However, if the instance has already been killed, it is not possible to retry.