Closed AlexanderSing closed 5 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5306.
This message was autogenerated
Hey @AlexanderSing thanks for raising this!
We have it in our map to also provide clear instructions and verify the 1.8 installation in an airgapped environment. For now we are waiting for the upstream 1.8.1 release, since it includes some necessary changes in KFP to have configurable images for launcher and driver https://github.com/kubeflow/pipelines/blob/2.0.5/CHANGELOG.md https://github.com/kubeflow/pipelines/pull/10269
Once those are up we'll also work on the other aspects of the Charms to work in airgap
Here the current output when trying to only use a bundle (no scripts)
Executing changes:
- upload charm /home/ubuntu/charms/kubeflow/unversioned/admission-webhook for series focal with architecture=amd64
- deploy application admission-webhook with 1 unit on focal
added resource oci-image
- upload charm /home/ubuntu/charms/kubeflow/unversioned/argo-controller for series focal with architecture=amd64
- deploy application argo-controller with 1 unit on focal
added resource oci-image
- upload charm /home/ubuntu/charms/kubeflow/unversioned/dex-auth for series focal with architecture=amd64
- deploy application dex-auth with 1 unit on focal
added resource oci-image
- upload charm /home/ubuntu/charms/kubeflow/unversioned/envoy for series focal with architecture=amd64
- deploy application envoy with 1 unit on focal
ERROR cannot deploy bundle: series "kubernetes" is not supported, supported series are: focal
Also current script I am trying to use to workaround not being able to use a bundle Specially to avoid hardcoded versions of the OCI images
OCI_REGISTRY=10.10.11.39:32000
IMAGES=~/k8s/images_kubeflow.txt
img(){ echo "$OCI_REGISTRY/$(cat $IMAGES | grep $1 | tail -n1)"; }
kfn=kubeflownotebookswg
juju deploy --trust --debug ./admission-webhook admission-webhook --resource oci-image=$(img $kfn/poddefaults-webhook)
juju deploy --trust --debug ./argo-controller argo-controller --resource oci-image=$(img argoproj/workflow-controller) --config executor-image=$(img argoproj/argoexec)
juju deploy --trust --debug ./dex-auth dex-auth --resource oci-image=$(img charmedkubeflow/dex)
juju deploy --trust --debug ./envoy envoy --resource oci-image=$(img ml-pipeline/metadata-envoy)
juju deploy --trust --debug ./istio-gateway istio-ingressgateway --config kind=ingress --config proxy-image=$(img istio/proxyv2)
version=$(img istio/proxyv2 | rev)
version=$(echo ${tmp/:/ } | awk '{print $1}' | rev)
juju deploy --trust --debug ./istio-pilot istio-pilot --config default-gateway=kubeflow-gateway --config image-configuration="pilot-image: 'pilot'
global-tag: '$version'
global-hub: '$OCI_REGISTRY/docker.io/istio'
global-proxy-image: 'proxyv2'
global-proxy-init-image: 'proxyv2'
grpc-bootstrap-init: 'busybox:1.28'
"
juju deploy --trust --debug ./jupyter-controller jupyter-controller --resource oci-image=$(img $kfn/notebook-controller)
juju deploy --trust --debug ./jupyter-ui jupyter-ui --resource oci-image=$(img $kfn/jupyter-web-app) \
--config jupyter-images="['$(img $kfn/jupyter-scipy)','$(img $kfn/jupyter-pytorch-full)','$(img $kfn/jupyter-pytorch-cuda-full)','$(img $kfn/jupyter-tensorflow-full)','$(img $kfn/jupyter-tensorflow-cuda-full)']" \
--config rstudio-images="['$(img $kfn/rstudio-tidyverse)']" \
--config vscode-images="['$(img $kfn/codeserver-python)']"
juju deploy --trust --debug ./katib-controller katib-controller --resource oci-image=$(img kubeflowkatib/katib-controller) \
--config custom_images="default_trial_template: '$(img kubeflowkatib/mxnet-mnist)'
early_stopping__medianstop: '$(img kubeflowkatib/earlystopping-medianstop)'
enas_cpu_template: '$(img kubeflowkatib/enas-cnn-cifar10-cpu)'
metrics_collector_sidecar__stdout: '$(img kubeflowkatib/file-metrics-collector)'
metrics_collector_sidecar__file: '$(img kubeflowkatib/file-metrics-collector)'
metrics_collector_sidecar__tensorflow_event: '$(img kubeflowkatib/tfevent-metrics-collector)'
pytorch_job_template__master: '$(img kubeflowkatib/pytorch-mnist-cpu)'
pytorch_job_template__worker: '$(img kubeflowkatib/pytorch-mnist-cpu)'
suggestion__random: '$(img kubeflowkatib/suggestion-hyperopt)'
suggestion__tpe: '$(img kubeflowkatib/suggestion-hyperopt)'
suggestion__grid: '$(img kubeflowkatib/suggestion-optuna)'
suggestion__hyperband: '$(img kubeflowkatib/suggestion-hyperband)'
suggestion__bayesianoptimization: '$(img kubeflowkatib/suggestion-skopt)'
suggestion__cmaes: '$(img kubeflowkatib/suggestion-goptuna)'
suggestion__sobol: '$(img kubeflowkatib/suggestion-goptuna)'
suggestion__multivariate_tpe: '$(img kubeflowkatib/suggestion-optuna)'
suggestion__enas: '$(img kubeflowkatib/suggestion-enas)'
suggestion__darts: '$(img kubeflowkatib/suggestion-darts)'
suggestion__pbt: '$(img kubeflowkatib/suggestion-pbt)'
"
juju deploy --trust --debug ./mysql-k8s katib-db --constraints="mem=2G" --resource mysql-image=$(img canonical/charmed-mysql)
juju deploy --trust --debug ./katib-db-manager katib-db-manager --resource oci-image=$(img kubeflowkatib/katib-db-manager)
juju deploy --trust --debug ./katib-ui katib-ui --resource oci-image=$(img kubeflowkatib/katib-ui)
juju deploy --trust --debug ./kfp-api kfp-api --resource oci-image=$(img charmedkubeflow/api-server)
juju deploy --trust --debug ./mysql-k8s kfp-db --constraints="mem=2G" --resource mysql-image=$(img canonical/charmed-mysql)
juju deploy --trust --debug ./kfp-metadata-writer kfp-metadata-writer --resource oci-image=$(img gcr.io/ml-pipeline/metadata-writer)
juju deploy --trust --debug ./kfp-persistence kfp-persistence --resource oci-image=$(img charmedkubeflow/persistenceagent)
juju deploy --trust --debug ./kfp-profile-controller kfp-profile-controller --resource oci-image=$(img python:3.7)
juju deploy --trust --debug ./kfp-schedwf kfp-schedwf --resource oci-image=$(img charmedkubeflow/scheduledworkflow)
juju deploy --trust --debug ./kfp-ui kfp-ui --resource ml-pipeline-ui=$(img charmedkubeflow/frontend)
juju deploy --trust --debug ./kfp-viewer kfp-viewer --resource kfp-viewer-image=$(img charmedkubeflow/viewer-crd-controller)
juju deploy --trust --debug ./kfp-viz kfp-viz --resource oci-image=$(img charmedkubeflow/visualization-server)
juju deploy --trust --debug ./knative-eventing knative-eventing --config namespace=knative-eventing
juju deploy --trust --debug ./knative-operator knative-operator --resource knative-operator-image=$(img gcr.io/knative-releases/knative.dev/operator/cmd/operator) --resource knative-operator-webhook-image=$(img gcr.io/knative-releases/knative.dev/operator/cmd/webhook) --config otel-collector-image=$(img otel/opentelemetry-collector)
juju deploy --trust --debug ./knative-serving knative-serving --config namespace=knative-serving --config istio.gateway.namespace=kubeflow --config istio.gateway.name=kubeflow-gateway \
--config custom_images="activator: $(img gcr.io/knative-releases/knative.dev/serving/cmd/activator | sed 's%@.*%%g')
autoscaler: $(img gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler | sed 's%@.*%%g')
controller: $(img gcr.io/knative-releases/knative.dev/serving/cmd/controller | sed 's%@.*%%g')
webhook: $(img gcr.io/knative-releases/knative.dev/serving/cmd/controller | sed 's%@.*%%g')
autoscaler-hpa: $(img gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler-hpa | sed 's%@.*%%g')
net-istio-controller/controller: $(img gcr.io/knative-releases/knative.dev/net-istio/cmd/controller | sed 's%@.*%%g')
net-istio-webhook/webhook: $(img gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook | sed 's%@.*%%g')
queue-proxy: $(img gcr.io/knative-releases/knative.dev/serving/cmd/queue | sed 's%@.*%%g')
"
juju deploy --trust --debug ./kserve-controller kserve-controller --resource kserve-controller-image=$(img kserve/kserve-controller) --resource kube-rbac-proxy-image=$(img gcr.io/kubebuilder/kube-rbac-proxy) --config custom_images="configmap__agent: $(img kserve/agent)
configmap__batcher: $(img kserve/agent)
configmap__explainers__alibi: $(img kserve/alibi-explainer)
configmap__explainers__art: $(img kserve/art-explainer)
configmap__logger: $(img kserve/agent)
configmap__router: $(img kserve/router)
configmap__storageInitializer: $(img kserve/storage-initializer)
serving_runtimes__lgbserver: $(img kserve/lgbserver)
serving_runtimes__kserve_mlserver: $(img docker.io/seldonio/mlserver)
serving_runtimes__paddleserver: $(img kserve/paddleserver)
serving_runtimes__pmmlserver: $(img kserve/pmmlserver)
serving_runtimes__sklearnserver: $(img kserve/sklearnserver)
serving_runtimes__tensorflow_serving: $(img tensorflow/serving)
serving_runtimes__torchserve: $(img pytorch/torchserve-kfs)
serving_runtimes__tritonserver: $(img nvcr.io/nvidia/tritonserver)
serving_runtimes__xgbserver: $(img kserve/xgbserver)
"
juju deploy --trust --debug ./kubeflow-dashboard kubeflow-dashboard --resource oci-image=$(img kubeflownotebookswg/centraldashboard)
juju deploy --trust --debug ./kubeflow-profiles kubeflow-profiles --resource profile-image=$(img kubeflownotebookswg/profile-controller) --resource kfam-image=$(img kubeflownotebookswg/kfam)
juju deploy --trust --debug ./kubeflow-roles kubeflow-roles
juju deploy --trust --debug ./kubeflow-volumes kubeflow-volumes --resource oci-image=$(img kubeflownotebookswg/volumes-web-app)
juju deploy --trust --debug ./metacontroller-operator metacontroller-operator --config metacontroller-image=$(img metacontrollerio/metacontroller)
juju deploy --trust --debug ./mlmd mlmd --resource oci-image=$(img gcr.io/tfx-oss-public/ml_metadata_store_server)
# FIXME Needs tweaks and use version without restrictive license
juju deploy --trust --debug ./minio minio --resource oci-image=$(img minio/minio)
juju deploy --trust --debug ./oidc-gatekeeper oidc-gatekeeper --resource oci-image=$(img charmedkubeflow/oidc-authservice)
juju deploy --trust --debug ./pvcviewer-operator pvcviewer-operator --series=focal --resource oci-image=$(img docker.io/kubeflownotebookswg/pvcviewer-controller) --resource oci-image-proxy=$(img kubebuilder/kube-rbac-proxy)
juju deploy --trust --debug ./seldon-core seldon-controller-manager --resource oci-image=$(img charmedkubeflow/seldon-core-operator) \
--config executor-container-image-and-version=$(img docker.io/seldonio/seldon-core-executor) \
--config custom_images="configmap__predictor__tensorflow__tensorflow: $(img charmedkubeflow/tensorflow-serving)
configmap__predictor__tensorflow__seldon: $(img seldonio/tfserving-proxy)
configmap__predictor__sklearn__seldon: $(img charmedkubeflow/sklearnserver)
configmap__predictor__sklearn__v2: $(img charmedkubeflow/mlserver-sklearn)
configmap__predictor__xgboost__seldon: $(img seldonio/xgboostserver)
configmap__predictor__xgboost__v2: $(img charmedkubeflow/mlserver-xgboost)
configmap__predictor__mlflow__seldon: $(img seldonio/mlflowserver)
configmap__predictor__mlflow__v2: $(img charmedkubeflow/mlserver-mlflow)
configmap__predictor__triton__v2: $(img nvcr.io/nvidia/tritonserver)
configmap__predictor__huggingface__v2: $(img charmedkubeflow/mlserver-huggingface)
configmap__predictor__tempo_server__v2: $(img seldonio/mlserver)
configmap_storageInitializer: $(img seldonio/rclone-storage-initializer)
configmap_explainer: $(img seldonio/alibiexplainer)
configmap_explainer_v2: $(img seldonio/mlserver)
"
juju deploy --trust --debug ./tensorboard-controller tensorboard-controller --resource tensorboard-controller-image=$(img kubeflownotebookswg/tensorboard-controller)
juju deploy --trust --debug ./tensorboards-web-app tensorboards-web-app --resource tensorboards-web-app-image=$(img kubeflownotebookswg/tensorboards-web-app)
juju deploy --trust --debug ./training-operator training-operator --resource training-operator-image=$(img kubeflow/training-operator)
# ----- Relations
juju relate argo-controller minio
juju relate dex-auth:oidc-client oidc-gatekeeper:oidc-client
juju relate istio-pilot:ingress dex-auth:ingress
juju relate istio-pilot:ingress envoy:ingress
juju relate istio-pilot:ingress jupyter-ui:ingress
juju relate istio-pilot:ingress katib-ui:ingress
juju relate istio-pilot:ingress kfp-ui:ingress
juju relate istio-pilot:ingress kubeflow-dashboard:ingress
juju relate istio-pilot:ingress kubeflow-volumes:ingress
juju relate istio-pilot:ingress oidc-gatekeeper:ingress
juju relate istio-pilot:ingress-auth oidc-gatekeeper:ingress-auth
juju relate istio-pilot:istio-pilot istio-ingressgateway:istio-pilot
juju relate istio-pilot:ingress tensorboards-web-app:ingress
juju relate istio-pilot:gateway-info tensorboard-controller:gateway-info
juju relate katib-db-manager:relational-db katib-db:database
juju relate kfp-api:relational-db kfp-db:database
juju relate kfp-api:kfp-api kfp-persistence:kfp-api
juju relate kfp-api:kfp-api kfp-ui:kfp-api
juju relate kfp-api:kfp-viz kfp-viz:kfp-viz
juju relate kfp-api:object-storage minio:object-storage
juju relate kfp-profile-controller:object-storage minio:object-storage
juju relate kfp-ui:object-storage minio:object-storage
juju relate kserve-controller:ingress-gateway istio-pilot:gateway-info
juju relate kserve-controller:local-gateway knative-serving:local-gateway
juju relate kubeflow-profiles kubeflow-dashboard
juju relate kubeflow-dashboard:links jupyter-ui:dashboard-links
juju relate kubeflow-dashboard:links katib-ui:dashboard-links
juju relate kubeflow-dashboard:links kfp-ui:dashboard-links
juju relate kubeflow-dashboard:links kubeflow-volumes:dashboard-links
juju relate kubeflow-dashboard:links tensorboards-web-app:dashboard-links
juju relate mlmd:grpc envoy:grpc
juju relate mlmd:grpc kfp-metadata-writer:grpc
And the current output I am getting when trying to do 1.8 airgapped is on https://github.com/canonical/envoy-operator/issues/72
Context
Recently, further scripts and tests for deploying CKF in an airgapped environment were merged and I am grateful for these contributions. Based on them I tried deploying stable/1.8 in an airgapped manner but ran into some problems.
set_pod_spec
call, which doesn't work in an airgapped environmentlightkube.core.exceptions.ApiError: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:kubeflow:argo-controller" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
In my case, I actually currently have access to the docker.io and gcr.io container registries, that's why they are still referred to in the .yaml file. However, this will be changed later on to a hosted registry.
I used the following bundle-airgap.yaml:
and followed that up with the following script:
What needs to get done
Definition of Done