canonical / kserve-operators

Charmed KServe
4 stars 2 forks source link

Controller not sending manifests to resource dispatcher #262

Open misohu opened 1 month ago

misohu commented 1 month ago

Bug Description

After deploying CKF 1.9/stable with resource dispatcher 2.0/stable the expected manifests for secrets and service-accounts are missing in the relation. The same relation is working for mlflow and resource dispatcher.

To Reproduce

  1. Deploy CKF 1.9 stable
  2. Deploy Resource-dispatcher 2.0 stable
  3. relate
    juju relate kserve-controller:secrets resource-dispatcher:secrets
    juju relate kserve-controller:service-accounts resource-dispatcher:service-accounts
  4. Check the unit of resource dispatcher for relation data
    juju show-unit resource-dispatcher/0

Environment

microk8s 1.29.5 juju 3.4 ckf 1.9/stable resource-dispatcher 2.0/stable

Relevant Log Output

resource-dispatcher/0:
  opened-ports: []
  charm: ch:amd64/focal/resource-dispatcher-182
  leader: true
  life: alive
  relation-info:
  - relation-id: 63
    endpoint: pod-defaults
    related-endpoint: pod-defaults
    application-data:
      kubernetes_manifests: '[{"apiVersion": "kubeflow.org/v1alpha1", "kind": "PodDefault",
        "metadata": {"name": "mlflow-server-access-minio"}, "spec": {"desc": "Allow
        access to Minio", "selector": {"matchLabels": {"access-minio": "true"}}, "env":
        [{"name": "AWS_ACCESS_KEY_ID", "valueFrom": {"secretKeyRef": {"name": "mlflow-server-minio-artifact",
        "key": "AWS_ACCESS_KEY_ID", "optional": false}}}, {"name": "AWS_SECRET_ACCESS_KEY",
        "valueFrom": {"secretKeyRef": {"name": "mlflow-server-minio-artifact", "key":
        "AWS_SECRET_ACCESS_KEY", "optional": false}}}, {"name": "MINIO_ENDPOINT_URL",
        "value": "http://mlflow-minio.kubeflow:9000"}]}}, {"apiVersion": "kubeflow.org/v1alpha1",
        "kind": "PodDefault", "metadata": {"name": "mlflow-server-minio"}, "spec":
        {"desc": "Allow access to MLFlow", "env": [{"name": "MLFLOW_S3_ENDPOINT_URL",
        "value": "http://mlflow-minio.kubeflow:9000"}, {"name": "MLFLOW_TRACKING_URI",
        "value": "http://mlflow-server.kubeflow.svc.cluster.local:5000"}], "selector":
        {"matchLabels": {"mlflow-server-minio": "true"}}}}]'
    related-units:
      mlflow-server/0:
        in-scope: true
        data:
          egress-subnets: 10.152.183.141/32
          ingress-address: 10.152.183.141
          private-address: 10.152.183.141
  - relation-id: 62
    endpoint: secrets
    related-endpoint: secrets
    application-data:
      kubernetes_manifests: '[{"apiVersion": "v1", "kind": "Secret", "metadata": {"name":
        "mlflow-server-minio-artifact"}, "stringData": {"AWS_ACCESS_KEY_ID": "minio",
        "AWS_SECRET_ACCESS_KEY": "5ZGBN5Y5ZUF1XN2ZZLJJZOLV4GIB30"}}, {"apiVersion":
        "v1", "kind": "Secret", "metadata": {"name": "mlflow-server-seldon-rclone-secret"},
        "stringData": {"RCLONE_CONFIG_S3_TYPE": "s3", "RCLONE_CONFIG_S3_PROVIDER":
        "minio", "RCLONE_CONFIG_S3_ENV_AUTH": "false", "RCLONE_CONFIG_S3_ACCESS_KEY_ID":
        "minio", "RCLONE_CONFIG_S3_SECRET_ACCESS_KEY": "5ZGBN5Y5ZUF1XN2ZZLJJZOLV4GIB30",
        "RCLONE_CONFIG_S3_ENDPOINT": "http://mlflow-minio.kubeflow:9000"}}]'
    related-units:
      mlflow-server/0:
        in-scope: true
        data:
          egress-subnets: 10.152.183.141/32
          ingress-address: 10.152.183.141
          private-address: 10.152.183.141
  - relation-id: 67
    endpoint: secrets
    related-endpoint: secrets
    application-data: {}
    related-units:
      kserve-controller/0:
        in-scope: true
        data:
          egress-subnets: 10.152.183.219/32
          ingress-address: 10.152.183.219
          private-address: 10.152.183.219
  - relation-id: 66
    endpoint: service-accounts
    related-endpoint: service-accounts
    application-data: {}
    related-units:
      kserve-controller/0:
        in-scope: true
        data:
          egress-subnets: 10.152.183.219/32
          ingress-address: 10.152.183.219
          private-address: 10.152.183.219
  provider-id: resource-dispatcher-0
  address: 10.1.100.230

Additional Context

No response

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6114.

This message was autogenerated

misohu commented 1 month ago

I have locally rerun integration tests for kserve-controller which include the test of integration with resource-dispatcher but I can clearly see that the relation data is filled with manifests. I have tried with resource-dispatcher 2.0/stable but still no problems. I have also tried to deploy the kserve from 1.9 bundle in my local computer but still no problems. I have tried also to change the minio as that one is sending credentials but still no problem.

Looks like the problem only occurs when bundle is deployed as whole.

misohu commented 1 month ago

The problem was missing relation between mlflow-minio and kserver-cotroller. This was also missing in the docs thats why we missed it.