dashbitco / broadway_cloud_pub_sub

A Broadway producer for Google Cloud Pub/Sub
Apache License 2.0
70 stars 24 forks source link

Feature Discussion: Client-agnostic, automatic lease management #24

Closed mcrumm closed 3 years ago

mcrumm commented 5 years ago

Opening this issue to get more input on automatic lease management.

Google Cloud Pub/Sub clients that support Asynchronous Pull (most notably, Python) provide a lease management mechanism behind-the-scenes to keep messages alive while they are being processed.

In python, the Leaser class provides the logic to extend the message ack deadline if it has not yet been acknowledged: https://github.com/googleapis/google-cloud-python/blob/7e16c08581a059a6c8e40b0403f4af1ebfe0fea6/pubsub/google/cloud/pubsub_v1/subscriber/_protocol/leaser.py#L100

The Leaser is aided by a Histogram that, by default, tracks the 99th percentile of ack requests to determine how long to extend the lease for a given message: https://github.com/googleapis/google-cloud-python/blob/7e16c08581a059a6c8e40b0403f4af1ebfe0fea6/pubsub/google/cloud/pubsub_v1/subscriber/_protocol/histogram.py

Ideally BroadwayCloudPubSub would provide the same functionality, but we can do so in a client-agnostic way, meaning our implementation would work for Pull requests over both HTTP and gRPC, as well as for Streaming Pull requests (gRPC only), we were ever to decide to implement those.

To keep this client-agnostic, BroadwayCloudPubSub.Producer probably needs to become the Acknowledger so that the Client implementation doesn't need to know about the lease management functionality.

_Discussion originally started by @mcrumm in https://github.com/plataformatec/broadway_cloud_pub_sub/issues/17#issuecomment-520164755_

josevalim commented 5 years ago

This sounds very neat but I also assume it is mostly a corner case, so I am glad to sit and wait until someone has a use case for it. I assume the initial value for ack is configurable, right? If so, that should be good enough for the huge majority of cases.

mcrumm commented 5 years ago

@josevalim The only potential issue is that the maximum ack deadline you can set on a message is 10 minutes. You can continue to modify the deadline and effectively keep the message checked out forever, but only in 10min increments.

I vaguely recall you posting something re: GenStage consumers, saying that long running processes are perfectly acceptable there, but I suppose that "long" could be subjective in this instance.

Are any of the other connectors subject to similar restrictions on message processing time?

Either way, I thought I might take a pass at implementing something like this from outside the connector first, and see where it gets me. Ultimately that may lead to more insights into what customizing acknowledgements might look like.

mcrumm commented 3 years ago

Closing this as the need has never arisen, but may revisit in the future :)