bentoml / Yatai

Model Deployment at Scale on Kubernetes 🦄️
https://bentoml.com
Other
789 stars 69 forks source link

`push` failed: request failed with status code 400 on GKE #315

Closed visionhong closed 2 years ago

visionhong commented 2 years ago

Hi Im using Yatai on GKE 1.21 version.

I am getting this error when trying to push from pod to yatai server.

Error: [cli]pushfailed: request failed with status code 400: {"error":"pre sign s3 upload url: cannot get ingress yatai-minio: ingresses.networking.k8s.io \"yatai-minio\" not found"}

The services created in the yatai-components namespace are:

스크린샷 2022-08-14 오후 2 30 12

And one Pod is still in the creating state, as shown below.

스크린샷 2022-08-14 오후 2 34 23

This is a Describe of the pod that seems to have a problem. k describe po yatai-yatai-deployment-operator-79658d656f-zrp8r -n yatai-components

Name:           yatai-yatai-deployment-operator-79658d656f-zrp8r
Namespace:      yatai-components
Priority:       0
Node:           gke-my-cluster-default-pool-c606ca12-lx18/10.128.0.19
Start Time:     Sun, 14 Aug 2022 12:57:33 +0900
Labels:         app.kubernetes.io/instance=yatai
                app.kubernetes.io/name=yatai-deployment-operator
                pod-template-hash=79658d656f
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/yatai-yatai-deployment-operator-79658d656f
Containers:
  manager:
    Container ID:  
    Image:         quay.io/bentoml/yatai-deployment-operator:v0.9.2
    Image ID:      
    Port:          9443/TCP
    Host Port:     0/TCP
    Command:
      /manager
    Args:
      --health-probe-bind-address=:8081
      --metrics-bind-address=127.0.0.1:8080
      --leader-elect
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      yatai-yatai-deployment-operator  Secret  Optional: false
    Environment:                       <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ssgz9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  yatai-yatai-deployment-operator-webhook-server-cert
    Optional:    false
  kube-api-access-ssgz9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                   From     Message
  ----     ------       ----                  ----     -------
  Warning  FailedMount  54m (x4 over 95m)     kubelet  Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[kube-api-access-ssgz9 cert]: timed out waiting for the condition
  Warning  FailedMount  9m20s (x34 over 97m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[cert kube-api-access-ssgz9]: timed out waiting for the condition
  Warning  FailedMount  4m4s (x55 over 99m)   kubelet  MountVolume.SetUp failed for volume "cert" : secret "yatai-yatai-deployment-operator-webhook-server-cert" not found

Yatai works fine in an on-premises environment. Is there something wrong with my GKE environment setup?

yetone commented 2 years ago

@tjems6498 Thanks for your report! I think this documentation will help you: https://github.com/bentoml/Yatai/blob/main/docs/admin-guide.md#verify-installation

visionhong commented 2 years ago

I fix this issue with this code: https://github.com/bentoml/Yatai/issues/276#issuecomment-1192823610

But i got new issue when i use push command.

Error: [cli] `push` failed: request failed with status code 400: {"error":"pre sign s3 upload url: get bucket yatai exist: Get \"
https://yatai-minio-yatai-infra-cluster-10-80-7-70.apps.yatai.dev/yatai/?location=
\": x509: certificate has expired or is not yet valid: current time 2022-08-22T05:03:48Z is after 2022-08-19T23:59:59Z"}

How can i fix it?

yetone commented 2 years ago

@tjems6498 Thanks for your report! Upgrading yatai to 0.4.6 will fix this issue