dashbitco / broadway_cloud_pub_sub

A Broadway producer for Google Cloud Pub/Sub
Apache License 2.0
70 stars 24 forks source link

make recv_timeout configurable for pull request #75

Closed greg-rychlewski closed 2 years ago

greg-rychlewski commented 2 years ago

Hi,

Currently the recv_timeout value is fixed as :infinity for the pull requests. I was wondering if you would consider accepting a PR to make it configurable.

My reason for asking is that my team has been having a recurring issue where it seems like the producer stops polling, causing the messages in the subscription to keep piling up without being processed. We know this is not an acknowledgement deadline issue because our instrumentation shows that the number of pull requests stops increasing and no messages have started the processing stage.

In general we have only seen this happen after a few hours of polling and not receiving any new messages. And it happens very infrequently (less than once a week). Although this is highly speculative, we think there might be an issue where Google's side is forgetting about our open connections without terminating them. One piece of speculation that points to this, is that we observed the issue at exactly the same time on 2 different subscriptions to the same topic (not 2 clients polling the same subscription, 2 completely different subscriptions). This might happen if Google is doing a sweep of open connections without terminating them properly.

There is some evidence that the reverse happens, where the client terminates the pull request and Google doesn't register it (see here and here).

Fiddling with the recv_timeout is pretty hacky, but at this point we are not sure if anything else can help alleviate the problem or give us more insight into what is happening. We wanted to try the return_immediately parameter but it looks like Google is saying it's deprecated, so doesn't seem like a long term solution.

For reference, we are using version 0.7 of this library on one machine and 0.62 on the other. But if this is a Google issue it should happen with the current main branch as well.

Thanks.

josevalim commented 2 years ago

PR is definitely welcome!