If I am reading the code correctly, when Pulsar is configured with remote_transfer and client.transport.curl is used, this uses the get_file() and post_file() functions, but not the PycurlTransport class, which has a timeout option defined, although that timeout option probably does not work the way we would want it to.
I have a Pulsar server which appears to have had a network interruption at some point. AMQP recovered and subsequent jobs are handled fine, but ones that were preprocessing during the hiccup are just stuck and have written 0 bytes to disk in the many hours since. Preprocessing thread stacks are:
I would expect TCP to do something more reasonable here so that part is a mystery to me, but we could probably also allow the configuration of the low speed limit and low speed time options to mitigate these sorts of issues.
If I am reading the code correctly, when Pulsar is configured with remote_transfer and
client.transport.curl
is used, this uses theget_file()
andpost_file()
functions, but not thePycurlTransport
class, which has a timeout option defined, although that timeout option probably does not work the way we would want it to.I have a Pulsar server which appears to have had a network interruption at some point. AMQP recovered and subsequent jobs are handled fine, but ones that were preprocessing during the hiccup are just stuck and have written 0 bytes to disk in the many hours since. Preprocessing thread stacks are:
I would expect TCP to do something more reasonable here so that part is a mystery to me, but we could probably also allow the configuration of the low speed limit and low speed time options to mitigate these sorts of issues.