argoproj-labs / argocd-operator

A Kubernetes operator for managing Argo CD clusters.
https://argocd-operator.readthedocs.io
Apache License 2.0
659 stars 769 forks source link

argocd-operator-controller-manager-service pod selector not unique to argo #1087

Open fuerer opened 12 months ago

fuerer commented 12 months ago

Describe the bug The service argocd-operator-controller-manager-service has the pod selector control-plane=controller-manager In our OpenShift environment we have two additional operators (amq/otel) which have the same label on their operator pods. With that the service points to the amq, argocd and otel pod.

To Reproduce Steps to reproduce the behavior:

  1. Install all 3 operators (amq broker/argocd/otel) in the OLM suggested namespace (openshift-operators) from the operator hub.
oc get pods -n openshift-operators -l control-plane=controller-manager
NAME                                                         READY   STATUS    RESTARTS   AGE
amq-broker-controller-manager-674dcf8866-bp5qd               1/1     Running   0          77m
argocd-operator-controller-manager-5887d66dc6-cghxz          1/1     Running   0          6d4h
opentelemetry-operator-controller-manager-5b64497db6-hg9qj   2/2     Running   0          64m

Expected behavior Have a more selective pod selector, for example use something like: app.kubernetes.io/name=argocd-operator, control-plane=controller-manager

Additional information ArgoCD Operator 0.8.0

keskad commented 11 months ago

Hi, I have experienced exactly the same issue on OpenShift, using latest ArgoCD Operator v0.8.0. I think the issue started appearing after upgrade to this version.

How it looks in ArgoCD (when trying to synchronize kind: ArgoCD):

error when retrieving current configuration of: Resource: "argoproj.io/v1alpha1, Resource=argocds", GroupVersionKind: "argoproj.io/v1alpha1, Kind=ArgoCD" Name: "xxxx", Namespace: "xxxx" from server for: "xyz": conversion webhook for argoproj.io/v1beta1, Kind=ArgoCD failed: Post "[https://argocd-operator-controller-manager-service.xxx.svc:443/convert?timeout=30s](https://argocd-operator-controller-manager-service.xxx.svc/convert?timeout=30s)": dial tcp xxx.xxx.y.zzz:9443: connect: connection refused

When I do the oc get pods -n openshift-operators -l control-plane=controller-manager I get very similar result, there are pods of other operators visible.

kind: Service is having a label selector control-plane=controller-manager which is too generic.

keskad commented 11 months ago

I'm not sure if I found the problematic place: https://github.com/argoproj-labs/argocd-operator/blob/v0.8.0/deploy/olm-catalog/argocd-operator/0.8.0/argocd-operator.v0.8.0.clusterserviceversion.yaml#L1726

It seems that in v0.9.0 the issue is probably resolved: https://github.com/argoproj-labs/argocd-operator/blob/c238af601bb59097ae446f9727807e77b259f04b/deploy/olm-catalog/argocd-operator/0.9.0/argocd-operator.v0.9.0.clusterserviceversion.yaml#L1726

So I guess that next upgrade to v0.9.0 will fix it and make manual workarounds not neccessary anymore.

fuerer commented 7 months ago

As version 0.9.0 has been released which fixed this issue, we can close this case.

keskad commented 7 months ago

I see that v0.9.0 disappeared?

In meantime somebody made an upgrade via RedHat Operator Hub and it ended with:

install strategy failed: Deployment.apps "argocd-operator-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"control-plane":"argocd-operator"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Elyytscha commented 2 months ago

@svghadi can we reopen this? we have the same issue with 0.12.0

image

i think the issue is this https://github.com/argoproj-labs/argocd-operator/blob/c93f2fed1fa343c47497f24b602ba9b830b1561e/deploy/olm-catalog/argocd-operator/0.12.0/argocd-operator.v0.12.0.clusterserviceversion.yaml#L1876-L1883 im guessing right now olm creates the service for the operator from the clusterserviceversion which is the only place where i find those (missing) labels

the issue is simply

  selector:
    control-plane: controller-manager

is not guaranteed to be unique

svghadi commented 2 months ago

It was updated to argocd-operator for one release and again changed back to controller-manager. Need to check with other maintainers to understand the reason.

Elyytscha commented 2 months ago

i would maybe add more labels then just one, so this can't happen again.

our workaround for now is installing the argocd operator in ins own namespace via an own operatorgroup, where only this operator is installed

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: argocd-operator
  namespace: argocd-operator
spec:
  upgradeStrategy: Default