ceph / go-ceph

Go bindings for Ceph :octopus: :octopus: :octopus:
MIT License
617 stars 256 forks source link

Add timeout to Ceph GET API calls #900

Open karthik-us opened 1 year ago

karthik-us commented 1 year ago

This is to add neccessary changes in go-ceph to handle the ceph-csi issue #https://github.com/ceph/ceph-csi/issues/3657.

Provide a way to configure the timeout for the ceph Get API calls to avoid command stuck if there is some problem between the ceph cluster and the csi driver (cluster health, slow ops, or short network connectivity problem)

For more info please refer to the ceph-csi issue.

phlogistonjohn commented 1 year ago

Can you be more specific about what APIs you mean? When I read "Get API calls" I think RGW (HTTP) APIs, but when I look at the linked issue it doesn't seem to be RGW specific.

The APIs that wrap C calls from Ceph do not support things like Go's contexts so the typical methods for timing out in Go do not work. There are some timeout related parameters in the ceph configuration that you could apply to a rados connection. You'd probably need to experiment with them to see what works for your use-case (if any).

karthik-us commented 1 year ago

Hi @phlogistonjohn, the problem that we are trying to solve is csi pod hang when there is something wrong in the ceph cluster or some network problems. In such cases pod restart is the only manual fix available at the moment. So we are trying to add timeouts to such csi calls (mainly the get calls). So if it is possible to do that directly on rados that would be great. Or else we might need to write wrappers around the get calls to handle it. Some more context on this can be found here (a bit old though).

Thanks for your inputs on the timeout related parameters in ceph configs. Let me check whether those can be useful here.

yxxhero commented 4 months ago

any updates?