apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.42k stars 1.16k forks source link

fabric8 kubernetes client timeout precision #4812

Open tysonnorris opened 4 years ago

tysonnorris commented 4 years ago

Environment details:

Steps to reproduce the issue:

  1. StandaloneOpenwhisk openwhisk with invalid docker image specified for a prewarm kind
  2. set the kubernetes run timeout to 60s
  3. start StandaloneOpenwhisk

e.g.

whisk {
  kubernetes {
    timeouts {
      run = 60 seconds

Provide the expected results and outputs:

Timeout error occurs for prewarm startup after the configured timeout(60s) with some reasonable overhead for execution etc. e.g. I expect the actual timeout to occur with 10s of the configured timeout.

Provide the actual results and outputs:

Timeout for kubernetes run is much longer than expected, like 30+seconds longer, but this difference changes with the scale of the configured timeout.

Additional information you deem important:

The actual timeout seems relative to the configured timeout, but in no cases was I able to make it behave close to the configured timeout.

I did take a look at the fabric8 client, and saw several past issues related to timeout handling. Looking at the code, I wasn't able to quickly determine what the problem is, but there are some Thread.sleep() calls buried within that I am concerned about.

A couple of options I've considered:

For simplicity, I'm wondering if there is any reason not to use the "plain" kubernetes java client?

@dgrove-oss Do you have any thoughts on this?

dgrove-oss commented 4 years ago

I think a big part of the decision to use the fabric8 client was because the official java kubernetes client didn't exist yet. We could consider switching over. I haven't looked into the details yet, but it is probably worth investigating. We only perform a few basic operations, so it might not be too hard to switch.