GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
657 stars 265 forks source link

adds connection timeout to job submitter #439

Open sv3ndk opened 3 years ago

sv3ndk commented 3 years ago

Hi all,

While experimenting with the Flink operator, I experienced some occasional glitches when the very first curl done by the job submitter hangs for a very long time before retrying, instead of the usual 5 seconds. It then eventually times out, although this long pause has an unnecessary impact on the deployment duration.

Checking job manager to be ready. Will check success of 2 API calls for stable job submission.
curl -sS "http://fandom-analytics-metrics-computer-jobmanager:8081/jobs"

I believe this happens when the job submitter calls curl before the job manager is even started, in which case we're getting connection timeout (after 120s) instead of an immediate "connection refused".

After adding the --connect-timeout 5, all deployment continued to work successfully, and I never experienced again the hanging behaviour mentioned above.

Let me know your thoughts

google-cla[bot] commented 3 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

sv3ndk commented 3 years ago

@googlebot I signed it!

sv3ndk commented 3 years ago

@hongyegong , could you please have a quick look at this minor update and let me know what you think? Thanks