devfile / devworkspace-operator

Apache License 2.0
67 stars 55 forks source link

Webhook Server Certificate #1157

Open bdwyertech opened 1 year ago

bdwyertech commented 1 year ago

Description

Seems like the webhook server is not getting restarted when cert-manager issues a new certificate. I would expect the devworkspace-controller-manager to do this, or for the webhook server to see that the cert has been rolled.

ERROR: Job failed (system failure): prepare environment: setting up trapping scripts on emptyDir: Internal error occurred: failed calling webhook "validate-exec.devworkspace-controller.svc": failed to call webhook: Post "[https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=10s](https://devworkspace-webhookserver.devworkspace-controller.svc/validate?timeout=10s)": tls: failed to verify certificate: x509: certificate signed by unknown authority. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Perhaps I just have something misconfigured, but when this cert expires, it causes issues for other non-devworkspace pods. Killing the webhook server and letting a new pod come up resolves the issue.

I am using the following Flux config to deploy the manifests under deploy/deployment/kubernetes/objects/

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: devworkspace-operator
  namespace: devworkspace-controller
spec:
  interval: 1h
  url: https://github.com/devfile/devworkspace-operator
  ref:
    tag: v0.22.0
  ignore: |
    # exclude all
    /*
    # include k8s deploy objects directory
    !/deploy/deployment/kubernetes/objects/
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: devworkspace-operator
  namespace: devworkspace-controller
spec:
  interval: 1h
  retryInterval: 1m
  timeout: 5m
  sourceRef:
    kind: GitRepository
    name: devworkspace-operator
  path: ./deploy/deployment/kubernetes/objects
  prune: true
jsnouffer commented 10 months ago

I also have been encountering this issue when cert-manager renews the webhook certificate, requiring a restart of the pod. It would be nice to get this addressed.

AObuchow commented 9 months ago

@bdwyertech @jsnouffer Thank you for reporting and following up on this issue. I apologize it went unattended for so long. I have been caught up with other priorities but wanted to let you know this issue's priority will be assessed, and it will hopefully be worked on in the near future.

dennisbalsam99 commented 1 month ago

Have also ran into this issue lately when our certs expired, have explored the repo for custom solutions but no luck so far

AObuchow commented 1 month ago

I still have to look into this further, but if I understand correctly, cert-manager will create a new certificate object on the cluster correct?

If so, maybe we could somehow:

dennisbalsam99 commented 1 month ago

Yes precisely, a new certificate object will be created, as well as a secret containing the cert and key. This secret is then attached to the DWO as a volume mount and is able to be read from here

But seems like a good solution to set the DWO to watch for secret object updates and update deployment accordingly as mentioned

AObuchow commented 1 month ago

@dennisbalsam99 Thank you for the follow-up, it's really appreciated :)

How are you installing DevWorkspace Operator by the way? Using chectl? Or using the Makefile scripts from the DevWorkspace Operator repo (or something else)?

AObuchow commented 1 month ago

Based on the discussion in https://github.com/eclipse-che/che/issues/23184, we should hopefully be able to 'cert-manager.io/inject-ca-from' the 'cert-manager.io/inject-ca-from' annotation to resolve this issue in a much more graceful manner than my original idea.