goharbor / harbor-operator

Kubernetes operator for Harbor service components
Apache License 2.0
354 stars 109 forks source link

Too many replica set are created for registry-ctl component #131

Closed steven-zou closed 3 years ago

steven-zou commented 4 years ago

Run make sample

Then check the resource, you'll find more than one registry-ctl replica sets that with 0, 0.

replicaset.apps/sample-harbor-registryctl-589b8bb8df   1         1         1       2m16s
replicaset.apps/sample-harbor-registryctl-5bbf7d4889   0         0         0       2m36s
replicaset.apps/sample-harbor-registryctl-644bd8fdbf   0         0         0       2m47s
replicaset.apps/sample-harbor-registryctl-644d9549b6   0         0         0       2m50s
replicaset.apps/sample-harbor-registryctl-687bbd59cd   0         0         0       2m34s
replicaset.apps/sample-harbor-registryctl-6988fbbfbd   0         0         0       2m42s
replicaset.apps/sample-harbor-registryctl-7464d974bb   0         0         0       2m45s
replicaset.apps/sample-harbor-registryctl-74c6d4f8cc   0         0         0       2m35s
replicaset.apps/sample-harbor-registryctl-7569fb84f4   0         0         0       2m52s
replicaset.apps/sample-harbor-registryctl-79cc6ccbf    0         0         0       2m46s
replicaset.apps/sample-harbor-registryctl-c58b8b4d8    0         0         0       2m56s

Deployment is not ready:

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/sample-harbor-registryctl   0/1     1            0           2d3h
steven-zou commented 4 years ago

one day later, get more replicaset:

replicaset.apps/sample-harbor-registryctl-54fc7f564    0         0         0       2d2h
replicaset.apps/sample-harbor-registryctl-5656d8d6f4   0         0         0       2d2h
replicaset.apps/sample-harbor-registryctl-647f499c85   0         0         0       31h
replicaset.apps/sample-harbor-registryctl-64b77c84fc   0         0         0       8h
replicaset.apps/sample-harbor-registryctl-678597bc6b   1         1         0       112m
replicaset.apps/sample-harbor-registryctl-687bc446c9   0         0         0       27h
replicaset.apps/sample-harbor-registryctl-6b8ccdcb9    1         1         0       163m
replicaset.apps/sample-harbor-registryctl-6dbc6ff85b   0         0         0       21h
replicaset.apps/sample-harbor-registryctl-6fb4fdd49    0         0         0       2d2h
replicaset.apps/sample-harbor-registryctl-745d586f66   0         0         0       21h
replicaset.apps/sample-harbor-registryctl-75d69f5d9f   0         0         0       12h
replicaset.apps/sample-harbor-registryctl-77485bd8     0         0         0       18h
replicaset.apps/sample-harbor-registryctl-775b7cdb86   0         0         0       163m
replicaset.apps/sample-harbor-registryctl-78cfc44d9    0         0         0       27h
replicaset.apps/sample-harbor-registryctl-7b9f4bcf86   0         0         0       11h
replicaset.apps/sample-harbor-registryctl-7bbddf4c58   0         0         0       21h
replicaset.apps/sample-harbor-registryctl-7bcdc65d99   0         0         0       18h
replicaset.apps/sample-harbor-registryctl-7bdb4f9b6    0         0         0       31h
replicaset.apps/sample-harbor-registryctl-7df56bc846   0         0         0       12h
replicaset.apps/sample-harbor-registryctl-9446d65      0         0         0       7h47m
replicaset.apps/sample-harbor-registryctl-997d8b5b4    0         0         0       29h
steven-zou commented 4 years ago
~$ kubectl rollout history deployment.apps/sample-harbor-registryctl
deployment.apps/sample-harbor-registryctl 
REVISION  CHANGE-CAUSE
1         <none>
2         <none>
3         <none>
4         <none>
5         <none>
6         <none>
7         <none>
8         <none>
9         <none>
10        <none>
11        <none>
12        <none>
13        <none>
14        <none>
15        <none>
16        <none>
17        <none>
18        <none>
19        <none>
20        <none>
21        <none>
22        <none>
steven-zou commented 4 years ago

Declare the new state of the Pods by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Each new ReplicaSet updates the revision of the Deployment.

steven-zou commented 4 years ago

Get the rollout history details by command kubectl rollout history deployment.apps/sample-harbor-registryctl --revision=xx and do comparisons, found the main changes are from the label and annotations which are related to checksum value:

revision=1

Pod Template:
  Labels:
        pod-template-hash=54fc7f564
  Annotations:
        sample-harbor.default.registry.registryctl.goharbor.io/version: 4860044

revision=22

Pod Template:
  Labels:
        pod-template-hash=5fb75b98db
  Annotations:
        sample-harbor.default.registry.registryctl.goharbor.io/version: 5640281
steven-zou commented 4 years ago

@holyhope

The annotation sample-harbor.default.registry.registryctl.goharbor.io/version for registryctl pod is very different from other component pods, the root cause may be the changes to this annotation. I check the code and did not find the concrete code to set such annotation value. Could u please provide some clues?

For example, similar annotations of the core deployment:

sample-harbor-core.default.secret.core.goharbor.io/version: "4859926" sample-harbor-core.default.configmap.core.goharbor.io/version: "4859925"

steven-zou commented 4 years ago

It seems the registry is a dependent resource of registryctl.

steven-zou commented 4 years ago

The number of replica of registryctl is still increasing:

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/sample-harbor-core-76f967d77f          1         1         0       3d2h
replicaset.apps/sample-harbor-jobservice-6b4f89bb96    1         1         0       3d2h
replicaset.apps/sample-harbor-jobservice-79b486c54d    1         1         0       3d2h
replicaset.apps/sample-harbor-portal-bbc6c9            1         1         0       3d2h
replicaset.apps/sample-harbor-registry-688fc75c75      1         1         0       3d2h
replicaset.apps/sample-harbor-registryctl-54fc7f564    0         0         0       3d2h
replicaset.apps/sample-harbor-registryctl-5656d8d6f4   0         0         0       3d2h
replicaset.apps/sample-harbor-registryctl-5b67d698b7   1         1         0       5h35m
replicaset.apps/sample-harbor-registryctl-5fb75b98db   0         0         0       23h
replicaset.apps/sample-harbor-registryctl-5fff5cfc69   0         0         0       10h
replicaset.apps/sample-harbor-registryctl-647f499c85   0         0         0       2d7h
replicaset.apps/sample-harbor-registryctl-64b77c84fc   0         0         0       32h
replicaset.apps/sample-harbor-registryctl-64ff4599b6   0         0         0       17h
replicaset.apps/sample-harbor-registryctl-678597bc6b   0         0         0       25h
replicaset.apps/sample-harbor-registryctl-687bc446c9   0         0         0       2d3h
replicaset.apps/sample-harbor-registryctl-68dfc69756   1         1         0       6h25m
replicaset.apps/sample-harbor-registryctl-6b4965dbbc   0         0         0       21h
replicaset.apps/sample-harbor-registryctl-6b8ccdcb9    0         0         0       26h
replicaset.apps/sample-harbor-registryctl-6dbc6ff85b   0         0         0       45h
replicaset.apps/sample-harbor-registryctl-6fb4fdd49    0         0         0       3d2h
replicaset.apps/sample-harbor-registryctl-745d586f66   0         0         0       45h
replicaset.apps/sample-harbor-registryctl-75d69f5d9f   0         0         0       36h
replicaset.apps/sample-harbor-registryctl-75dd5f9b94   0         0         0       10h
replicaset.apps/sample-harbor-registryctl-76767cb449   0         0         0       10h
replicaset.apps/sample-harbor-registryctl-77485bd8     0         0         0       42h
replicaset.apps/sample-harbor-registryctl-775b7cdb86   0         0         0       26h
replicaset.apps/sample-harbor-registryctl-777cfd4b9c   0         0         0       16h
replicaset.apps/sample-harbor-registryctl-78cfc44d9    0         0         0       2d3h
replicaset.apps/sample-harbor-registryctl-7b9f4bcf86   0         0         0       35h
replicaset.apps/sample-harbor-registryctl-7bbddf4c58   0         0         0       45h
replicaset.apps/sample-harbor-registryctl-7bcdc65d99   0         0         0       42h
replicaset.apps/sample-harbor-registryctl-7bdb4f9b6    0         0         0       2d7h
replicaset.apps/sample-harbor-registryctl-7cd5b9bbd4   0         0         0       7h35m
replicaset.apps/sample-harbor-registryctl-7df56bc846   0         0         0       36h
replicaset.apps/sample-harbor-registryctl-85fd447d8b   0         0         0       14h
replicaset.apps/sample-harbor-registryctl-8666688874   0         0         0       17h
replicaset.apps/sample-harbor-registryctl-9446d65      0         0         0       31h
replicaset.apps/sample-harbor-registryctl-997d8b5b4    0         0         0       2d5h
replicaset.apps/sample-harbor-registryctl-c569ddc64    0         0         0       10h
replicaset.apps/sample-harbor-registryctl-cbb96c8bc    0         0         0       7h35m

The desired pods are not successfully created:

pod/sample-harbor-core-76f967d77f-vzlvq          0/1     ContainerCreating   0          3d2h
pod/sample-harbor-jobservice-6b4f89bb96-q7t77    0/1     ContainerCreating   0          3d2h
pod/sample-harbor-jobservice-79b486c54d-rb88q    0/1     ContainerCreating   0          3d2h
pod/sample-harbor-portal-bbc6c9-n7p5b            0/1     ContainerCreating   0          3d2h
pod/sample-harbor-registry-688fc75c75-wtrrh      0/1     ContainerCreating   0          3d2h
pod/sample-harbor-registryctl-5b67d698b7-xwc2x   0/1     ContainerCreating   0          5h35m
pod/sample-harbor-registryctl-68dfc69756-689lm   0/1     ContainerCreating   0          6h25m
holyhope commented 4 years ago

I see that issue on my side too I am working on test suite with better scenari and increased coverage

steven-zou commented 4 years ago

Find some logs

2020-10-30T09:54:41.400Z ERROR controller-runtime.controller Reconciler error {"controller": "registrycontroller", "request": "default/sample-harbor", "error": "cannot set status to error: cannot set conditions to error: apply apps/v1, Kind=Deployment (default/sample-harbor-registryctl): check: cannot get apps/v1, Kind=Deployment default/sample-harbor-registryctl: Deployment.apps \"sample-harbor-registryctl\" not found: apply apps/v1, Kind=Deployment (default/sample-harbor-registryctl): check: cannot get apps/v1, Kind=Deployment default/sample-harbor-registryctl: Deployment.apps \"sample-harbor-registryctl\" not found", "errorVerbose": "Deployment.apps \"sample-harbor-registryctl\" not found\ncannot get apps/v1, Kind=Deployment default/sample-harbor-registryctl\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).ensureResourceReady\n\t/home/steven/code/harbor-operator/pkg/controller/ready.go:36\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).applyAndCheck\n\t/home/steven/code/harbor-operator/pkg/controller/common.go:138\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).ProcessFunc.func1\n\t/home/steven/code/harbor-operator/pkg/controller/resource.go:115\ngithub.com/goharbor/harbor-operator/pkg/graph.(resourceManager).Run.func1\n\t/home/steven/code/harbor-operator/pkg/graph/runner.go:42\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/home/steven/code/harbor-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\ncheck\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).applyAndCheck\n\t/home/steven/code/harbor-operator/pkg/controller/common.go:140\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).ProcessFunc.func1\n\t/home/steven/code/harbor-operator/pkg/controller/resource.go:115\ngithub.com/goharbor/harbor-operator/pkg/graph.(resourceManager).Run.func1\n\t/home/steven/code/harbor-operator/pkg/graph/runner.go:42\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/home/steven/code/harbor-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\napply apps/v1, Kind=Deployment (default/sample-harbor-registryctl)\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).ProcessFunc.func1\n\t/home/steven/code/harbor-operator/pkg/controller/resource.go:117\ngithub.com/goharbor/harbor-operator/pkg/graph.(resourceManager).Run.func1\n\t/home/steven/code/harbor-operator/pkg/graph/runner.go:42\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/home/steven/code/harbor-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\ncannot set status to error: cannot set conditions to error: apply apps/v1, Kind=Deployment (default/sample-harbor-registryctl): check: cannot get apps/v1, Kind=Deployment default/sample-harbor-registryctl: Deployment.apps \"sample-harbor-registryctl\" not found\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).HandleError\n\t/home/steven/code/harbor-operator/pkg/controller/errors.go:50\ngithub.com/goharbor/harbor-operator/pkg/controller.(Controller).Reconcile\n\t/home/steven/code/harbor-operator/pkg/controller/common.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:245\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:221\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker\n\t/home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:200\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 github.com/go-logr/zapr.(zapLogger).Error /home/steven/code/harbor-operator/vendor/github.com/go-logr/zapr/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:247 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:221 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /home/steven/code/harbor-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:200 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /home/steven/code/harbor-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /home/steven/code/harbor-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /home/steven/code/harbor-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /home/steven/code/harbor-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90

steven-zou commented 4 years ago

@holyhope

Do u have any clue about this issue? And do you know where this annotation sample-harbor.default.registry.registryctl.goharbor.io/version: 5640281 is set?

For rs, each label/annotation change will cause a new rs created.

For the logs posted above, it seems there some outdated changes applied to registrycontroller component.

steven-zou commented 3 years ago

@glitchcrab

Any progress on this issue?

sguyennet commented 3 years ago

Hi Steven, I did some digging yesterday, but I'm still not sure what is the root cause of this issue. It seems related to the internal certificate that cannot be mounted. Here is what I saw in the Kubernetes events:

31s         Normal    Scheduled                pod/sample-harbor-registryctl-656677c-rkdss        Successfully assigned default/sample-harbor-registryctl-656677c-rkdss to node-67c9b53b-8e68-4f3d-976a-7499a466ca1f
32s         Normal    SuccessfulCreate         replicaset/sample-harbor-registryctl-656677c       Created pod: sample-harbor-registryctl-656677c-rkdss
30s         Normal    Scheduled                pod/sample-harbor-registryctl-565485c695-c5l6t     Successfully assigned default/sample-harbor-registryctl-565485c695-c5l6t to node-67c9b53b-8e68-4f3d-976a-7499a466ca1f
31s         Normal    ScalingReplicaSet        deployment/sample-harbor-registryctl               Scaled up replica set sample-harbor-registryctl-565485c695 to 1
31s         Normal    SuccessfulCreate         replicaset/sample-harbor-registryctl-565485c695    Created pod: sample-harbor-registryctl-565485c695-c5l6t
32s         Normal    ScalingReplicaSet        deployment/sample-harbor-registryctl               Scaled up replica set sample-harbor-registryctl-656677c to 1
30s         Warning   FailedMount              pod/sample-harbor-registryctl-656677c-rkdss        MountVolume.SetUp failed for volume "internal-certificates" : failed to sync secret cache: timed out waiting for the condition
15s         Normal    Pulling                  pod/sample-harbor-registryctl-565485c695-c5l6t     Pulling image "goharbor/harbor-registryctl:v2.0.0"
15s         Normal    Pulling                  pod/sample-harbor-registryctl-656677c-rkdss        Pulling image "goharbor/harbor-registryctl:v2.0.0"
13s         Normal    Created                  pod/sample-harbor-registryctl-656677c-rkdss        Created container registryctl
13s         Normal    Pulled                   pod/sample-harbor-registryctl-656677c-rkdss        Successfully pulled image "goharbor/harbor-registryctl:v2.0.0" in 2.436909146s
12s         Normal    Pulled                   pod/sample-harbor-registryctl-565485c695-c5l6t     Successfully pulled image "goharbor/harbor-registryctl:v2.0.0" in 3.452909536s
12s         Normal    Started                  pod/sample-harbor-registryctl-656677c-rkdss        Started container registryctl
11s         Normal    Started                  pod/sample-harbor-registryctl-565485c695-c5l6t     Started container registryctl
11s         Normal    Created                  pod/sample-harbor-registryctl-565485c695-c5l6t     Created container registryctl
6s          Normal    ScalingReplicaSet        deployment/sample-harbor-registryctl               Scaled up replica set sample-harbor-registryctl-5b78575b9c to 1
6s          Normal    SuccessfulCreate         replicaset/sample-harbor-registryctl-5b78575b9c    Created pod: sample-harbor-registryctl-5b78575b9c-jbwpr
5s          Normal    Scheduled                pod/sample-harbor-registryctl-5b78575b9c-jbwpr     Successfully assigned default/sample-harbor-registryctl-5b78575b9c-jbwpr to node-67c9b53b-8e68-4f3d-976a-7499a466ca1f
6s          Normal    SuccessfulDelete         replicaset/sample-harbor-registryctl-565485c695    Deleted pod: sample-harbor-registryctl-565485c695-c5l6t
6s          Normal    ScalingReplicaSet        deployment/sample-harbor-registryctl               Scaled down replica set sample-harbor-registryctl-565485c695 to 0
6s          Normal    Killing                  pod/sample-harbor-registryctl-565485c695-c5l6t     Stopping container registryctl
4s          Normal    Pulling                  pod/sample-harbor-registryctl-5b78575b9c-jbwpr     Pulling image "goharbor/harbor-registryctl:v2.0.0"
3s          Normal    Started                  pod/sample-harbor-registryctl-5b78575b9c-jbwpr     Started container registryctl
3s          Normal    Created                  pod/sample-harbor-registryctl-5b78575b9c-jbwpr     Created container registryctl
3s          Normal    Pulled                   pod/sample-harbor-registryctl-5b78575b9c-jbwpr     Successfully pulled image "goharbor/harbor-registryctl:v2.0.0" in 1.272386998s

and this is the list of replicasets when I reproduced the issue:

NAME                                    DESIRED   CURRENT   READY   AGE
sample-harbor-registryctl-565485c695    0         0         0       3m34s
sample-harbor-registryctl-5b78575b9c    1         1         1       3m10s
sample-harbor-registryctl-656677c       0         0         0       3m35s
sguyennet commented 3 years ago

Hi Steven,

After more digging, I found out that the registryctl is redeployed because the registry custom ressource is modified. The default.registry.checksum.goharbor.io/sample-harbor is changing inside the registrycontroller custom ressource at the same time that the new replicaset is created.

I also found that the resourceVersion is modified inside the registry custom ressource, but I currently don't know why yet.

sguyennet commented 3 years ago

Hi Steven,

We identified the root cause of the issue. The operator is watching the secret of the registry internal certificate. When the secret is created, it is created empty and then populated with the certificate. For the operator the secret exist, therefore he deploy the registry and the registryctl. When the certificate is inserted in the secret the operator detect the modification and change the resource version of the registry, which trigger a redeploy of the registryctl because the checksum of the registry have changed.

I will modify the code of the operator in charge of checking if the secret exist or not.

steven-zou commented 3 years ago

@sguyennet

Any progress on this bug?

steven-zou commented 3 years ago

@sguyennet @holyhope

PING! Any updates about the fix to this issue?

sguyennet commented 3 years ago

Hi @steven-zou, The issue with the replicas et is fixed but we modified the way the objects are created and updated in Kubernetes. This introduced other bugs. We are currently working to solve those.