GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Add a way to configure ackDeadlineSeconds #533

Closed jung-kim closed 7 years ago

jung-kim commented 7 years ago

fixes: #532

Does no validation of ackDeadlineSeconds, I believe acceptable range is t >= 0, and let the exception to be thrown at the random subscription creation time.

googlebot commented 7 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


jung-kim commented 7 years ago

I signed it!

googlebot commented 7 years ago

CLAs look good, thanks!

dhalperi commented 7 years ago

Hi @CodingTwinky -- this will not actually affect behavior when reading from Pub/Sub on Google Cloud Dataflow. Is this intended to change testing behavior using the InProcessPipelineRunner?

jung-kim commented 7 years ago

My intention was to add a way for user to configure ackDeadlineSeconds when no subscription is defined and random one is created via dataflow. Am I missing something?

dhalperi commented 7 years ago

@codingtwinky the code that creates the temporary subscription in the Java SDK is only used when running with the InProcessPipelineRunner. the DataflowPipelineRunner uses different code.

Can you say more information about why you want to change the ack deadline? You might want to reach out to the Dataflow support team: https://cloud.google.com/dataflow/support if you are seeing issues when running on the DataflowPipelineRunner that you believe are attributable to this parameter.

jung-kim commented 7 years ago

Reason why we were experimenting with various ack deadline was and experimentation to see dataflow's behavior during high backlog situation, which we've been having hard time with. We were able to test by creating subscription manually but if we could configure ack deadline for throwaway subscription we can create on dataflow creation.

reuvenlax commented 7 years ago

Dataflow should automatically extend acks if processing takes longer than the deadline. What exactly were the effects you saw by changing the deadline on the subscription?

jung-kim commented 7 years ago

For our use case, once dataflow failed to keep up with pubsub's publish rate, ~18k/sec, it gets into weird "ack deadline passed" death spiral and only way to recover is to pubsub's publish rate to go down.

Increasing pubsub deadline ack timeout was an attempt at delaying when this death spiral happens.

Considering I can manually create a pubsub subscription with desired configuration I will close this issue but it would be nice to be able to configure this.