Closed cameronkerrnz closed 3 years ago
@cameronkerrnz can you paste the output of oc describe pod/image-pruner-1617667200-gr7q6 -n openshift-image-registry
it should have an event which tell us why error is happened for this pruner pod. Also you can manually try to delete that pod and see if new one created and working as expected oc delete pod/image-pruner-1617667200-gr7q6 -n openshift-image-registry
?
PS> oc describe pod/image-pruner-1617667200-gr7q6 -n openshift-image-registry
Name: image-pruner-1617667200-gr7q6
Namespace: openshift-image-registry
Priority: 0
Node: crc-rsppg-master-0/192.168.126.11
Start Time: Tue, 06 Apr 2021 12:13:59 +1200
Labels: controller-uid=ab13eb6d-9f08-4a29-8c01-d881378642dd
job-name=image-pruner-1617667200
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.217.0.63"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.217.0.63"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted
Status: Failed
IP: 10.217.0.63
IPs:
IP: 10.217.0.63
Controlled By: Job/image-pruner-1617667200
Containers:
image-pruner:
Container ID: cri-o://597e972955d090f224e453218cd7550d0c0c7b463f187d5bb2b10d5f5fbce41f
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:50ca35f6d1b839a790d657d298db7f32844ece8da14f99f2b55e5b6d5642fd2a
Image ID: image-registry.openshift-image-registry.svc:5000/openshift/cli@sha256:50ca35f6d1b839a790d657d298db7f32844ece8da14f99f2b55e5b6d5642fd2a
Port: <none>
Host Port: <none>
Command:
oc
Args:
adm
prune
images
--confirm=true
--certificate-authority=/var/run/configmaps/serviceca/service-ca.crt
--keep-tag-revisions=3
--keep-younger-than=60m
--ignore-invalid-refs=true
--loglevel=1
--prune-registry=true
--registry-url=https://image-registry.openshift-image-registry.svc:5000
State: Terminated
Reason: Error
Message: Error from server (ServiceUnavailable): the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
Exit Code: 1
Started: Tue, 06 Apr 2021 12:14:17 +1200
Finished: Tue, 06 Apr 2021 12:14:18 +1200
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 256Mi
Environment: <none>
Mounts:
/var/run/configmaps/serviceca from serviceca (ro)
/var/run/secrets/kubernetes.io/serviceaccount from pruner-token-n2x2s (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
serviceca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: serviceca
Optional: false
pruner-token-n2x2s:
Type: Secret (a volume populated by a Secret)
SecretName: pruner-token-n2x2s
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
After deleting the pod as requested, the cluster still shows as degraded; but not sure how it determins that.
Deleting the Job (that the CronJob created) does seem to be sufficient to get the degraded state of the controller happy again:
PS> oc delete job image-pruner-1617667200 -n openshift-image-registry
job.batch "image-pruner-1617667200" deleted
PS> oc get co image-registry
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
image-registry 4.7.2 True False False 4d7h
Thanks, Cameron
@cameronkerrnz Thanks, I will take closer look into this as soon as I am able to reproduce it 👍🏼
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
General information
crc setup
before starting it (Yes/No)? YesCRC version
CRC status
CRC config
Host Operating System
Steps to reproduce
crc start
will take at least 10 minutes to completeExpected
Working cluster with an image registry I can be confident is in a workable state.
Actual
I do appear to have a working cluster; but it took a long time to start, and the degraded state appears due to issues with the image registry (specifically the image-pruner CronJob)
Logs
Before gather the logs try following if that fix your issue
Yes, it still happens. Here's some diagnostics
I suspect this is an issue due to image-pruner being a CronJob issue that fires before the cluster is stable.
If this is a CronJob, do I just workaround this issue by deleting the Job and Pod that it created? Would that be enough to get the cluster out of 'degraded' state? I've yet to experience CronJob objects in Kubernetes.