cloud-native-toolkit / planning

The is the planning repo to manage the cross project Epics and Issues. Tasks and Bugs
3 stars 1 forks source link

Timing issue when provisioning Tekton then immediately deploying tasks and pipelines #854

Open seansund opened 3 years ago

seansund commented 3 years ago

When Tekton is provisioned, it waits until the CRD is available before declaring the module is complete and therefore Tekton is ready. However, the Tekton deployment also creates a validating webhook that is applied when any Tekton task or pipeline is applied to validate the configuration.

There is a window of time where the operator is deployed and running, the CRDs are available, but the webhook has not yet finished initializing. In this case, when the tekton resources are applied an error is returned that the webhook end point is not responding.

Several options for how to address this were discussed:

  1. Wait for the rollout status of the webhook resource to complete
  2. Wait for an arbitrary amount of time after the CRD is available
  3. Deploy a job into the cluster to wait for the webhook endpoint to be available (it must be a job deployed into the cluster because the webhook endpoint is exposed via a service that is not accessible outside of the cluster)

We have decided on Option 3 as the solution moving forward since #1 requires extra permission in the cluster to be able to read and wait for the deployment in another namespace and #2 is inefficient and may still result in the issue.

The webhook endpoint is from the log is https://tekton-pipelines-webhook.openshift-pipelines.svc:443/defaulting?timeout=10s

csantanapr commented 3 years ago

@seansund implemented a check for the webhook