bentoml / Yatai

Model Deployment at Scale on Kubernetes 🦄️
https://bentoml.com
Other
789 stars 69 forks source link

Installation of yatai-deployment failed to [ERROR registerYataiComponent...] #492

Closed linyqh closed 11 months ago

linyqh commented 11 months ago

I encountered an error during the deployment of yatai-deployment following the latest deployment guide for yatai. Here are the details:

安装步骤: quick-install-yatai.sh quick-install-yatai-image-builder.sh quick-install-yatai-deployment.sh 依次执行:quick-install-yatai.sh -> quick-install-yatai-image-builder.sh -> quick-install-yatai-deployment.sh

View the logs of yatai-deployment:

kubectl -n yatai-deployment logs -f deploy/yatai-deployment

output:

Version: 1.1.16
GitCommit: 9903eee
BuildDate: 2023-10-26T10:18:54Z
1.7009327030785189e+09  INFO    controller-runtime.builder      skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called       {"GVK": "serving.yatai.ai/v1alpha2, Kind=BentoDeployment"}
1.700932703078543e+09   INFO    controller-runtime.builder      skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called     {"GVK": "serving.yatai.ai/v1alpha2, Kind=BentoDeployment"}
1.7009327030786834e+09  INFO    controller-runtime.webhook      Registering webhook     {"path": "/convert"}
1.7009327030787663e+09  INFO    controller-runtime.builder      Conversion webhook enabled      {"GVK": "serving.yatai.ai/v1alpha2, Kind=BentoDeployment"}
1.700932703078783e+09   INFO    setup   starting manager
1.7009327030789227e+09  INFO    getting yatai client    {"func": "doRegisterYataiComponent"}
1.700932703078922e+09   INFO    start cleaning up abandoned runner services     {"func": "doCleanUpAbandonedRunnerServices"}
1.7009327030791228e+09  ERROR   registerYataiComponent  {"func": "registerYataiComponent", "error": "get yatai client: get yatai config: get secret: the cache is not started, can not read objects", "errorVerbose": "the cache is not started, can not read objects\nget secret\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).getYataiClient.func1\n\t/workspace/controllers/bentodeployment_controller.go:644\ngithub.com/bentoml/yatai-common/config.GetYataiConfig\n\t/go/pkg/mod/github.com/bentoml/yatai-common@v0.0.0-20231016054533-fb836e058cfb/config/config.go:218\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).getYataiClient\n\t/workspace/controllers/bentodeployment_controller.go:638\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doRegisterYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3031\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).registerYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3077\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nget yatai config\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).getYataiClient\n\t/workspace/controllers/bentodeployment_controller.go:648\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doRegisterYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3031\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).registerYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3077\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nget yatai client\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doRegisterYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3033\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).registerYataiComponent\n\t/workspace/controllers/bentodeployment_controller.go:3077\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
github.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).registerYataiComponent
        /workspace/controllers/bentodeployment_controller.go:3079
1.7009327030793006e+09  ERROR   cleanUpAbandonedRunnerServices  {"func": "cleanUpAbandonedRunnerServices", "error": "get bento deployment namespaces: get secret: the cache is not started, can not read objects", "errorVerbose": "the cache is not started, can not read objects\nget secret\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doCleanUpAbandonedRunnerServices.func1\n\t/workspace/controllers/bentodeployment_controller.go:2963\ngithub.com/bentoml/yatai-common/config.GetBentoDeploymentNamespaces\n\t/go/pkg/mod/github.com/bentoml/yatai-common@v0.0.0-20231016054533-fb836e058cfb/config/config.go:106\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doCleanUpAbandonedRunnerServices\n\t/workspace/controllers/bentodeployment_controller.go:2957\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).cleanUpAbandonedRunnerServices\n\t/workspace/controllers/bentodeployment_controller.go:3011\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nget bento deployment namespaces\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).doCleanUpAbandonedRunnerServices\n\t/workspace/controllers/bentodeployment_controller.go:2966\ngithub.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).cleanUpAbandonedRunnerServices\n\t/workspace/controllers/bentodeployment_controller.go:3011\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
github.com/bentoml/yatai-deployment/controllers.(*BentoDeploymentReconciler).cleanUpAbandonedRunnerServices
        /workspace/controllers/bentodeployment_controller.go:3013
1.7009327030795062e+09  INFO    controller-runtime.webhook.webhooks     Starting webhook server
1.7009327030795715e+09  INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
1.700932703079645e+09   INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.7009327030797544e+09  INFO    controller-runtime.certwatcher  Updated current TLS certificate
1.7009327030798388e+09  INFO    controller-runtime.webhook      Serving webhook server  {"host": "", "port": 9443}
1.700932703079913e+09   INFO    controller-runtime.certwatcher  Starting certificate watcher
yetone commented 11 months ago

‌‌‌ This error can be ignored.

linyqh commented 11 months ago

‌‌‌ This error can be ignored.

Thank you for your reply, but the front end says yatai-deployment is not available

michaelwang1994-olo commented 11 months ago

Also experiencing the same issue

michaelwang1994-olo commented 11 months ago

@linyqh I temporarily resolved this by backtracking to version 1.1.14, but I haven't found the root issue.

#!/bin/bash

set -e

DEVEL=${DEVEL:-false}
DEVEL_HELM_REPO=${DEVEL_HELM_REPO:-false}

is_minikube=false
if kubectl config view --minify | grep 'minikube.sigs.k8s.io' > /dev/null; then
  is_minikube=true
  MINIKUBE_PROFILE_NAME=$(kubectl config current-context)
fi

# check if jq command exists
if ! command -v jq &> /dev/null; then
  arch=$(uname -m)
  # download jq from github by different arch
  if [[ $arch == "x86_64" && $OSTYPE == 'darwin'* ]]; then
    jq_archived_name="gojq_v0.12.9_darwin_amd64"
  elif [[ $arch == "arm64" && $OSTYPE == 'darwin'* ]]; then
    jq_archived_name="gojq_v0.12.9_darwin_arm64"
  elif [[ $arch == "x86_64" && $OSTYPE == 'linux'* ]]; then
    jq_archived_name="gojq_v0.12.9_linux_amd64"
  elif [[ $arch == "aarch64" && $OSTYPE == 'linux'* ]]; then
    jq_archived_name="gojq_v0.12.9_linux_arm64"
  else
    echo "jq command not found, please install it first"
    exit 1
  fi
  echo "📥 downloading jq from github"
  if [[ $OSTYPE == 'darwin'* ]]; then
    curl -sL -o /tmp/yatai-jq.zip "https://github.com/itchyny/gojq/releases/download/v0.12.9/${jq_archived_name}.zip"
    echo "✅ downloaded jq to /tmp/yatai-jq.zip"
    echo "📦 extracting yatai-jq.zip"
    unzip -q /tmp/yatai-jq.zip -d /tmp
  else
    curl -sL -o /tmp/yatai-jq.tar.gz "https://github.com/itchyny/gojq/releases/download/v0.12.9/${jq_archived_name}.tar.gz"
    echo "✅ downloaded jq to /tmp/yatai-jq.tar.gz"
    echo "📦 extracting yatai-jq.tar.gz"
    tar zxf /tmp/yatai-jq.tar.gz -C /tmp
  fi
  echo "✅ extracted jq to /tmp/${jq_archived_name}"
  jq="/tmp/${jq_archived_name}/gojq"
else
  jq=$(which jq)
fi

# check if kubectl command exists
if ! command -v kubectl >/dev/null 2>&1; then
  echo "😱 kubectl command is not found, please install it first!" >&2
  exit 1
fi

KUBE_VERSION=$(kubectl version --output=json | $jq '.serverVersion.minor')
if [ ${KUBE_VERSION:1:2} -lt 20 ]; then
  echo "😱 install requires at least Kubernetes 1.20" >&2
  exit 1
fi

# check if helm command exists
if ! command -v helm >/dev/null 2>&1; then
  echo "😱 helm command is not found, please install it first!" >&2
  exit 1
fi

IGNORE_INGRESS=${IGNORE_INGRESS:-false}

if [ "${IGNORE_INGRESS}" = "false" ]; then
  AUTOMATIC_DOMAIN_SUFFIX_GENERATION=${AUTOMATIC_DOMAIN_SUFFIX_GENERATION:-true}
  INGRESS_CLASS=$(kubectl get ingressclass -o jsonpath='{.items[0].metadata.name}' 2> /dev/null || true)
  # check if ingress class is empty
  if [ -z "$INGRESS_CLASS" ]; then
    if [ "$is_minikube" != "true" ]; then
      echo "😱 ingress controller is not found, please install it first!" >&2
      exit 1
    else
      echo "🤖 installing ingress for minikube"
      minikube addons enable ingress --profile="${MINIKUBE_PROFILE_NAME}"
      echo "✅ ingress installed"
    fi
  fi

  INGRESS_CLASS=$(kubectl get ingressclass -o jsonpath='{.items[0].metadata.name}' 2> /dev/null || true)
  # check if ingress class is empty
  if [ -z "$INGRESS_CLASS" ]; then
    echo "😱 ingress controller is not found, please install it first!" >&2
    exit 1
  fi
else
  echo "🤖 ignoring ingress check"
  AUTOMATIC_DOMAIN_SUFFIX_GENERATION=${AUTOMATIC_DOMAIN_SUFFIX_GENERATION:-false}
  INGRESS_CLASS=""
fi

CHECK_YATAI_IMAGE_BUILDER=${CHECK_YATAI_IMAGE_BUILDER:-true}

if [ "${CHECK_YATAI_IMAGE_BUILDER}" = "true" ]; then
  echo "🧪 verifying that the yatai-image-builder is running"
  if ! kubectl -n yatai-image-builder wait --for=condition=ready --timeout=10s pod -l app.kubernetes.io/name=yatai-image-builder; then
    echo "😱 yatai-image-builder is not ready, please wait for it to be ready!" >&2
    exit 1
  fi
  echo "✅ yatai-image-builder is ready"
fi

namespace=yatai-deployment
bento_deployment_namespace=yatai

# check if namespace exists
if ! kubectl get namespace ${namespace} >/dev/null 2>&1; then
  echo "🤖 creating namespace ${namespace}"
  kubectl create namespace ${namespace}
  echo "✅ namespace ${namespace} created"
fi

if ! kubectl get namespace ${bento_deployment_namespace} >/dev/null 2>&1; then
  echo "🤖 creating namespace ${bento_deployment_namespace}"
  kubectl create namespace ${bento_deployment_namespace}
  echo "✅ namespace ${bento_deployment_namespace} created"
fi

new_cert_manager=0

if [ $(kubectl get pod -A -l app=cert-manager 2> /dev/null | wc -l) = 0 ]; then
  new_cert_manager=1
  echo "🤖 installing cert-manager..."
  kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml
  sleep 1
else
  echo "😀 cert-manager is already installed"
fi

echo "⏳ waiting for cert-manager to be ready..."
kubectl wait --for=condition=ready --timeout=600s pod -l app.kubernetes.io/instance=cert-manager -A
echo "✅ cert-manager is ready"

if [ ${new_cert_manager} = 1 ]; then
  echo "😴 sleep 10s to make cert-manager really work 🤷"
  sleep 10
  echo "✨ wake up"
fi

cat <<EOF > /tmp/cert-manager-test-resources.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: test-selfsigned
  namespace: ${namespace}
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: selfsigned-cert
  namespace: ${namespace}
spec:
  dnsNames:
    - example.com
  secretName: selfsigned-cert-tls
  issuerRef:
    name: test-selfsigned
EOF

kubectl apply -f /tmp/cert-manager-test-resources.yaml
echo "🧪 verifying that the cert-manager is working properly"
sleep 5
if ! kubectl -n ${namespace} wait --for=condition=ready --timeout=30s certificate selfsigned-cert; then
  echo "😱 self-signed certificate is not issued, please check cert-manager installation!" >&2
  exit 1;
fi
kubectl delete -f /tmp/cert-manager-test-resources.yaml
echo "✅ cert-manager is working properly"

SKIP_METRICS_SERVER=${SKIP_METRICS_SERVER:-false}

if [ "${SKIP_METRICS_SERVER}" = "false" ]; then
  if [ $(kubectl get pod -A -l k8s-app=metrics-server 2> /dev/null | wc -l) = 0 ]; then
    echo "🤖 installing metrics-server..."
    if [ "${is_minikube}" = "true" ]; then
      minikube addons enable metrics-server
    else
      kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    fi
  else
    echo "😀 metrics-server is already installed"
  fi

  echo "⏳ waiting for metrics-server to be ready..."
  kubectl wait --for=condition=ready --timeout=600s pod -l k8s-app=metrics-server -A
  echo "✅ metrics-server is ready"
else
  echo "🤖 skipping metrics-server installation"
fi

YATAI_ENDPOINT=${YATAI_ENDPOINT:-http://yatai.yatai-system.svc.cluster.local}
if [ "${YATAI_ENDPOINT}" = "empty" ]; then
    YATAI_ENDPOINT=""
fi

YATAI_SERVICE_ACCOUNT=${YATAI_SERVICE_ACCOUNT:-yatai}

USE_LOCAL_HELM_CHART=${USE_LOCAL_HELM_CHART:-false}

INGRESS_TLS_MODE=${INGRESS_TLS_MODE:-none}
INGRESS_STATIC_TLS_SECRET_NAME=${INGRESS_STATIC_TLS_SECRET_NAME:-""}

if [[ "$INGRESS_TLS_MODE" == "static" ]]; then
    if [[ -z "$INGRESS_STATIC_TLS_SECRET_NAME" ]]; then
        echo "😱 INGRESS_STATIC_TLS_SECRET_NAME must not be empty when INGRESS_TLS_MODE is 'static'!" >&2
        exit 1
    fi
fi

if [ "${USE_LOCAL_HELM_CHART}" = "true" ]; then
  YATAI_DEPLOYMENT_IMG_REGISTRY=${YATAI_DEPLOYMENT_IMG_REGISTRY:-quay.io/bentoml}
  YATAI_DEPLOYMENT_IMG_REPO=${YATAI_DEPLOYMENT_IMG_REPO:-yatai-deployment}
  YATAI_DEPLOYMENT_IMG_TAG=${YATAI_DEPLOYMENT_IMG_TAG:-0.0.1}

  echo "🤖 installing yatai-deployment-crds from local helm chart..."
  helm upgrade --install yatai-deployment-crds ./helm/yatai-deployment-crds -n ${namespace}
  echo "⏳ waiting for yatai-deployment CRDs to be established..."
  kubectl wait --for condition=established --timeout=120s crd/bentodeployments.serving.yatai.ai
  echo "✅ yatai-deployment CRDs are established"

  echo "🤖 installing yatai-deployment from local helm chart..."
  helm upgrade --install yatai-deployment ./helm/yatai-deployment -n ${namespace} \
    --set registry=${YATAI_DEPLOYMENT_IMG_REGISTRY} \
    --set image.repository=${YATAI_DEPLOYMENT_IMG_REPO} \
    --set image.tag=${YATAI_DEPLOYMENT_IMG_TAG} \
    --set yatai.endpoint=${YATAI_ENDPOINT} \
    --set layers.network.ingressClass=${INGRESS_CLASS} \
    --set layers.network.ingressTlsMode=${INGRESS_TLS_MODE} \
    --set layers.network.ingressStaticTlsSecretName=${INGRESS_STATIC_TLS_SECRET_NAME} \
    --set layers.network.automaticDomainSuffixGeneration=${AUTOMATIC_DOMAIN_SUFFIX_GENERATION} \
    --set layers.network.domainSuffix=${DOMAIN_SUFFIX} \
    --set enableRestrictedSecurityContext=true
else
  helm_repo_name=bentoml
  helm_repo_url=https://bentoml.github.io/helm-charts

  # check if DEVEL_HELM_REPO is true
  if [ "${DEVEL_HELM_REPO}" = "true" ]; then
    helm_repo_name=bentoml-devel
    helm_repo_url=https://bentoml.github.io/helm-charts-devel
  fi

  helm_repo_name=${HELM_REPO_NAME:-${helm_repo_name}}
  helm_repo_url=${HELM_REPO_URL:-${helm_repo_url}}

  helm repo remove ${helm_repo_name} 2> /dev/null || true
  helm repo add ${helm_repo_name} ${helm_repo_url}
  helm repo update ${helm_repo_name}

  # if $VERSION is not set, use the latest version
  if [ -z "$VERSION" ]; then
    VERSION=$(helm search repo ${helm_repo_name} --devel="$DEVEL" -l | grep "${helm_repo_name}/yatai-deployment " | awk '{print $2}' | head -n 1)
  fi

  echo "🤖 installing yatai-deployment-crds from helm repo ${helm_repo_url}..."
  helm upgrade --install yatai-deployment-crds yatai-deployment-crds --repo ${helm_repo_url} -n ${namespace} --devel=${DEVEL}

  echo "⏳ waiting for yatai-deployment CRDs to be established..."
  kubectl wait --for condition=established --timeout=120s crd/bentodeployments.serving.yatai.ai
  echo "✅ yatai-deployment CRDs are established"
  VERSION=1.1.14
  echo "🤖 installing yatai-deployment ${VERSION} from helm repo ${helm_repo_url}..."
  helm upgrade --install yatai-deployment yatai-deployment --repo ${helm_repo_url} -n ${namespace} \
    --set yatai.endpoint=${YATAI_ENDPOINT} \
    --set layers.network.ingressClass=${INGRESS_CLASS} \
    --set layers.network.ingressTlsMode=${INGRESS_TLS_MODE} \
    --set layers.network.ingressStaticTlsSecretName=${INGRESS_STATIC_TLS_SECRET_NAME} \
    --set layers.network.automaticDomainSuffixGeneration=${AUTOMATIC_DOMAIN_SUFFIX_GENERATION} \
    --set layers.network.domainSuffix=${DOMAIN_SUFFIX} \
    --set enableRestrictedSecurityContext=true \
    --set yataiSystem.serviceAccountName=$YATAI_SERVICE_ACCOUNT \
    --version=${VERSION} \
    --devel=${DEVEL}
fi

if [ "${AUTOMATIC_DOMAIN_SUFFIX_GENERATION}" = "true" ]; then
  echo "⏳ waiting for job yatai-deployment-default-domain to be complete..."
  kubectl -n ${namespace} wait --for=condition=complete --timeout=600s job/yatai-deployment-default-domain
  echo "✅ job yatai-deployment-default-domain is complete"
fi

kubectl -n ${namespace} rollout restart deploy/yatai-deployment

echo "⏳ waiting for yatai-deployment to be ready..."
kubectl -n ${namespace} wait --for=condition=available --timeout=600s deploy/yatai-deployment
echo "✅ yatai-deployment is ready"
yetone commented 11 months ago

The latest version has resolved this issue; please update to yatai-deployment v1.1.20.