SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.37k stars 831 forks source link

Openshift SeldonDeployment not managed by correct Operator pod when deploying multiple namespace scoped Operator Installs #3963

Open strangiato opened 2 years ago

strangiato commented 2 years ago

Describe the bug

When installing Seldon as a namespaced operator in multiple namespaces the SeldonDeployment objects deployed in the second namespace will be managed and deployed by the operator pod running in the first namespace. If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.

To reproduce

  1. oc new-project seldon-test-1
  2. oc new-project seldon-test-2
  3. Install Seldon via the OperatorHub console using a namespace scoped install in the seldon-test-1 namespace:

image

or create the following yaml objects:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: seldon-operator-certified
  namespace: seldon-test-1
spec:
  channel: stable
  installPlanApproval: Automatic
  name: seldon-operator-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: seldon-test-1
  namespace: seldon-test-1
spec:
  targetNamespaces:
    - seldon-test-1
  1. Repeat step three for seldon-test-2 or create the following objects:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: seldon-operator-certified
  namespace: seldon-test-2
spec:
  channel: stable
  installPlanApproval: Automatic
  name: seldon-operator-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: seldon-test-2
  namespace: seldon-test-2
spec:
  targetNamespaces:
    - seldon-test-2
  1. Follow the logs for the operator deployed in seldon-test-1:

oc logs $(oc get pod -l control-plane=seldon-controller-manager -o name -n seldon-test-1) --follow -n seldon-test-1

  1. Create a SeldonDeployment in seldon-test-2:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
    app.kubernetes.io/instance: seldon1
    app.kubernetes.io/name: seldon
    app.kubernetes.io/version: v0.5
  name: seldon-model
  namespace: seldon-test2
spec:
  name: test-deployment
  predictors:
    - componentSpecs:
        - spec:
            containers:
              - image: 'seldonio/mock_classifier:1.6.0'
                name: classifier
      graph:
        children: []
        name: classifier
        type: MODEL
      name: example
      replicas: 1
status: {}

Expected behaviour

The SeldonDeployment created in seldon-test-2 should be managed by the operator deployed in seldon-test-2 and should not be managed by the version of the operator deployed in seldon-test-1. The logs in the operator deployed in seldon-test-1 will show that it is deploying the new resource and the operator in seldon-test-2 does not indicate any activity.

Environment

Model Details

Using the default example model

RafalSkolasinski commented 2 years ago

Hi @strangiato,

Thanks for bringing this to our attention. Is the namespace installation something you are actively looking to use?

strangiato commented 2 years ago

Hi Rafal, yes this is the default deployment strategy when Seldon is deployed from OpenDataHub on OpenShift.

RafalSkolasinski commented 2 years ago

Interesting. Is it possible to deploy on OpenDataHub using All namespaces on the cluster option for the meantime?

strangiato commented 2 years ago

The ODH operator itself is generally deployed as a cluster scoped operator, but when a user chooses to deploy Seldon it would deploy it as a namespace scoped operator in that specific users namespace.

strangiato commented 2 years ago

For the sake of documentation I create a corresponding Issue for the ODH project here:

https://issues.redhat.com/browse/ODH-608

RafalSkolasinski commented 2 years ago

So users of the ODH cannot install Seldon Operator cluster wide then. Could for the meantime be that administrators of the cluster could install both ODH + Seldon Operator (avail. in all namespace) and then users of ODH could just create SeldonDeployments?

strangiato commented 2 years ago

Yeah, that was the work around that I ended up implementing as an immediate resolution of the issue for my specific use case.

ukclivecox commented 2 years ago

@strangiato Is this still an issue for you or is workaround ok?

strangiato commented 2 years ago

The work around is fine for now but I would still consider this a bug and potential security vulnerability for anyone installing in a namespaced mode.

CloudMarc commented 1 year ago

We are seeing this on Seldon Core Operator 1.16.0 on GKE 1.24 and 1.25 in namespaced scope. Exactly as originally stated,

If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.