Open greglanthier opened 4 years ago
Digging a bit deeper, maybe the following:
should be something like this?
if !crtReady {
c := ba.GetStatus().NewCondition()
c.SetType(common.StatusConditionTypeReconciled)
c.SetStatus(corev1.ConditionFalse)
c.SetReason("CertificateNotReady")
c.SetMessage("Waiting for service certificate to be generated")
ba.GetStatus().SetCondition(c)
rtObj := ba.(runtime.Object)
r.UpdateStatus(rtObj)
return reconcile.Result{Requeue: true}, nil
}
Once, the certificate is generated and updated, the reconcile loop should be triggered through a watch we set on Certificate CRD.
This avoids unnecessary reconciles if certificate is not ready yet or it is invalid. Unfortunately if it is not ready the Certificate CRD does not provide the good message.
You could verify that Certificate is Ready and if not you can check CertificateRequest object.
However, maybe there is condition in which Certificate update event is not caught
So in my reproducer the certificate is ready, but there is no follow up reconcile requests being queued in the runtime-component-operator
.
i.e.: after applying the yaml in my original question I get a certificate that looks like this:
$ kubectl get certificates.cert-manager.io ldap-svc-crt -o json |jq .status
{
"conditions": [
{
"lastTransitionTime": "2020-06-09T15:40:56Z",
"message": "Certificate is up to date and has not expired",
"reason": "Ready",
"status": "True",
"type": "Ready"
}
],
"notAfter": "2021-06-09T15:40:56Z"
}
Maybe the watch you linked to isn't firing? Maybe there is no owner reference being set on the RuntimeComponent
itself at this point?
I'll see if I can enable some extra logging in the operator in my testing cluster.
ah - wait a moment...
My certificate is using apiVersion: cert-manager.io/v1alpha3
.
apiVersion: cert-manager.io/v1alpha3
kind: Certificate
metadata:
...
spec:
...
status:
conditions:
- lastTransitionTime: "2020-06-09T15:40:56Z"
message: Certificate is up to date and has not expired
reason: Ready
status: "True"
type: Ready
notAfter: "2021-06-09T15:40:56Z"
I bet that's the problem...
I'm using cert-manager-v0.15.1. Maybe if I downgrade to an earlier cert-manager?
I tried to find a version of cert-manager I could install via OLM but I didn't have any luck :(
That could be the case, I have not tested with new version, I think last one I've tried is 0.13 Also wondering if you do trigger additional reconcile (change RC CR) will it get correct status
Oh if I reapply the RuntimeComponent
yaml a 2nd time everything works - by which I mean the deployment & service is created and the ready certificate is injected into the workload containers.
Sorry - I should have mentioned that in my original question.
I wonder if given this there is some other way to requeue reconcile attempts (on a less aggressive interval) to account for these cases?
I'm thinking of the case where runtime-compoent-operator
creates a cert-manager.io/v1alpha2
version of Certificate
but that is possibly upgraded silently by an upgrade-webhook in cert-manger (not saying thats happening here, but symptomatically thats similar to what I'm seeing).
FYI: Rolling back to cert-manager 0.13.1 corrected the behaviour I originally asked about (thanks for the hints @arturdzm!).
As long as an operator that supports api version cert-manager.io/v1alpha2
resources is in the cluster then runtime-component-operator
runs as expected.
This issue can probably be closed, but a new issue should likely be logged to support cert-manager.io/v1alpha3
and/or call out explicitly in the readme that there is a dependency on specific versions of cert-manager
.
It was pretty easy to grab the latest cert-manager chart and ignore the apiVersions the CRs in the samples.
thanks @greglanthier
@arturdzm - if you can please add the API requirement to the readme, that would be great.
In the near future we can evaluate if we would upgrade the support to the v1alpha3
API or if there's another approach / framework that we should support.
Type of question
I want to make sure I understand how Runtime Component instances should be reconciled in a cluster that is hosting a cluster-wide cert-manager instance.
In clusters where
cert-manager
takes a moment to issue a certificate I've observed that aRuntimeComponent
that includes acertificate: {}
stanza like the one described here is sometimes never updated. This results in no deployments or services being created.Question
What did you do?
I deployed
runtime-component-operator
using OLM andcert-manager
using helm to an IKS cluster.Once both operators were running I applied the following YAML to my cluster:
but no deployments or services for the ldap
RuntimeComponent
were created. I observed the following status updates in theRuntimeComponent
:What did you expect to see?
I was expecting the
runtime-component-operator
to continue to attempt to reconcile theRuntimeComponent
until the self signed certificate was issued bycert-manager
(i.e.: the equivalent ofreturn reconcile.Result{Requeue: true }
).What did you see instead? Under which circumstances?
It looks like
runtime-component-operator
failed to reconcile the self-signed certificate and gave up trying to create the corresponding deployment & service.This happened in bigger clusters with many namespaces and more busy workloads.
Environment
I produced this problem on an IBM Kubernetes Service cluster running 1.18.2.
I ran the same test using
kind
v0.8.1 with akindest/node:v1.18.2
-based cluster and the self-signed certificate was reconciled with the ldapRuntimeComponent
. My hunch is the IKS cluster was busier with more namespaces to scan, and socert-manager
-issued certificates take longer to appear than in a simplekind
cluster.Additional context
I briefly dug into the reconcile code in
runtimecomponent_controller.go
and observed the following lines:https://github.com/application-stacks/runtime-component-operator/blob/c7eaf97c155eaf0a9cd3be9e0b1fff1fdc1fab16/pkg/controller/runtimecomponent/runtimecomponent_controller.go#L443-L455
Is it possible that line 454 is a typo and this block should read like this:
Have I misunderstood what
runtime-component-operator
intends to do in this case?