bentoml / Yatai

Model Deployment at Scale on Kubernetes 🦄️
https://bentoml.com
Other
790 stars 69 forks source link

Cannot deploy with Yatai 0.2.1 #210

Closed amelki closed 2 years ago

amelki commented 2 years ago

I was using Yatai 0.1.4 successfully so far. I upgraded to 0.2.1 in order to get the limits / requests fix (https://github.com/bentoml/Yatai/issues/187). Here is how I did:

Since then, I can't deploy bentos anymore, and constanty get this message in the "deployment events":

[2022-04-05 14:44:50] [BentoDeployment] [mytest] [GetBento] Fetching Bento mybento:4ctcgsfupga6svtp
[2022-04-05 14:44:50] [BentoDeployment] [mytest] [GetBento] Failed to fetch Bento mybento:4ctcgsfupga6svtp: DoJsonRequest Error: [GET]http://localhost:7777/api/v1/bento_repositories/mybento/bentos/4ctcgsfupga6svtp: Get "http://localhost:7777/api/v1/bento_repositories/mybento/bentos/4ctcgsfupga6svtp": dial tcp 127.0.0.1:7777: connect: connection refused

I also see the following log for the Yatai system pod, at the time I press "Submit" in the deployment panel:

2022/04/05 12:58:10 /home/runner/work/Yatai/Yatai/api-server/services/deployment.go:303 driver: bad connection
[2.078ms] [rows:0] UPDATE "deployment" SET "status"='deploying',"status_updated_at"='2022-04-05 12:58:10.71',"updated_at"='2022-04-05 12:58:10.71' WHERE id = 2 AND "deployment"."deleted_at" IS NULL

Does that ring any bell ? Thanks

amelki commented 2 years ago

@yetone I see this new yatai.endpoint property; https://github.com/bentoml/Yatai/blob/05ea8a691d8613d3e43e59e1896399fb3a47ce33/scripts/helm-charts/yatai-deployment-comp-operator/templates/secret.yaml#L10 Can this be related ? Is it normal to have http://localhost:7777 ? How shall I set this property ? To which value ?

yetone commented 2 years ago

@amelki

Thank you for your report!

The logic for yatai to find its own endpoint is to first find the release of yatai helm chart, if there is a helm release, then splice the endpoint according to the release name, if not, fallback to http://localhost:7777, you can Run helm list -A to see if there is a release of yatai chart in this cluster

amelki commented 2 years ago

@yetone here is an extract of what helm list -A answers

deployment                          yatai-operators         1           2022-04-06 15:08:59.171007321 +0000 UTC deployed    yatai-deployment-comp-operator-0.2.0    0.2.0      
yatai                               yatai-components        1           2022-04-06 15:09:08.365526672 +0000 UTC deployed    yatai-deployment-operator-0.0.1         0.0.1      
yatai                               yatai-system            1           2022-04-06 16:39:46.851022 +0200 CEST   deployed    Yatai install-0.1.0                                
yatai-csi-driver-image-populator    yatai-components        1           2022-04-06 15:09:20.903540707 +0000 UTC deployed    csi-driver-image-populator-0.1.0        1.16.0     
yatai-docker-registry               yatai-components        1           2022-04-06 15:11:17.401767538 +0000 UTC deployed    docker-registry-1.14.0                  2.7.1      
yatai-ingress-controller            yatai-components        1           2022-04-06 15:09:21.654832783 +0000 UTC deployed    ingress-nginx-4.0.9                     1.0.5      
yatai-minio                         yatai-components        1           2022-04-06 15:09:48.412264113 +0000 UTC deployed    minio-operator-4.3.5                    v4.3.5     

So what should I do to make things work ? It was working perfectly well with 0.1.4.

amelki commented 2 years ago

@yetone @timliubentoml I know your plate is already full, but I really need a solution for this issue, in order to test the fix for https://github.com/bentoml/BentoML/issues/2371#issuecomment-1090365873. Downgrading to 0.1.4 is not a solution since I really need some fixes included in 0.2.1 (especially the fix for requests/limits). I keep investigating on my side, but any tip or idea would be helpful. Thanks

yubozhao commented 2 years ago

Hey @amelki It seems like the yatai chart is not in your helm list -A command. Here is what I have in mine:

NAME                                NAMESPACE           REVISION    UPDATED                                 STATUS      CHART                                   APP VERSION
deployment                          yatai-operators     1           2022-04-06 22:08:35.6797314 +0000 UTC   deployed    yatai-deployment-comp-operator-0.2.0    0.2.0
yatai                               yatai-system        1           2022-04-06 15:04:32.42912 -0700 PDT     deployed    yatai-0.2.1                             0.2.1
yatai                               yatai-components    1           2022-04-06 22:08:49.1168209 +0000 UTC   deployed    yatai-deployment-operator-0.0.1         0.0.1
yatai-csi-driver-image-populator    yatai-components    1           2022-04-06 22:09:09.8879211 +0000 UTC   deployed    csi-driver-image-populator-0.1.0        1.16.0
yatai-docker-registry               yatai-components    1           2022-04-06 22:11:59.7300198 +0000 UTC   deployed    docker-registry-1.14.0                  2.7.1
yatai-ingress-controller            yatai-components    1           2022-04-06 22:09:10.3395883 +0000 UTC   deployed    ingress-nginx-4.0.9                     1.0.5
yatai-minio                         yatai-components    1           2022-04-06 22:10:46.9887158 +0000 UTC   deployed    minio-operator-4.3.5                    v4.3.5

For my setup(on Minikube) The yatai display chart version with yatai-0.2.1 and app version is 0.2.1. Yours are missing these fields. Can you tell me how you install the helm chart?

Also, to find out the Yatai_endpoint value for the deployment operator, you can try out this kubectl command:

kubectl get secrets yatai-yatai-deployment-operator --namespace yatai-components -o jsonpath="{.data.YATAI_ENDPOINT}" | base64 --decode

I was able to create deployments from both web UI and through kubectl apply -f deployment.yaml

I think the next step for you could be make sure the yatai-chart is up to date. And it is installed correctly via helm

parano commented 2 years ago

To make sure you have the latest Yatai helm repo, run helm repo update. And then in helm search repo, you should see the following:

NAME        CHART VERSION   APP VERSION DESCRIPTION
yatai/yatai 0.2.1           0.2.1       Yatai Helm chart
parano commented 2 years ago

@amelki we found out that this is due to an issue with the current delete script, where the yatai-system namespace is not deleted properly. That's why you are still seeing an older version of Yatai api-server, which is incompatible with the new deployment operator in 0.2.1. We are working on a fix now.

parano commented 2 years ago

@amelki the fix has been merged, could you try reinstall Yatai again with the following?

Delete existing yatai installation:

bash -c "$(curl https://raw.githubusercontent.com/bentoml/yatai-chart/main/delete-yatai.sh)"

Reinstall:

helm repo update
helm install yatai yatai/yatai -n yatai-system --create-namespace
amelki commented 2 years ago

@parano in fact I created my own Helm chart, which pulls Yatai as a dependency. I named it 'Yatai install' and set the vesion to 0.1.0 - hence the confusion. About the delete script, I already customized it for a while, and it even goes a bit further. Maybe it goes too far, but at least I am certain that no more yatai related stuff is existing in my cluster after the script is run:

#!/bin/bash

kubens yatai-system
kubectl delete -A ValidatingWebhookConfiguration yatai-ingress-controller-ingress-nginx-admission
kubectl delete clusterrolebindings yatai-ingress-controller-ingress-nginx
kubectl delete clusterrolebindings yatai-deployment-comp-operator-manager-rolebinding
kubectl delete clusterrolebindings minio-operator-binding
kubectl delete clusterrolebindings console-sa-binding

kubectl delete clusterrole yatai-ingress-controller-ingress-nginx
kubectl delete clusterrole deployment-yatai-deployment-comp-operator
kubectl delete clusterrole minio-operator-role
kubectl delete clusterrole console-sa-role

# Use helm to remove yatai installation
helm uninstall yatai -n yatai-system

# Remove additional yatai related namespaces
kubectl delete namespace yatai-components
kubectl delete namespace yatai-operators
kubectl delete namespace yatai-builders
kubectl delete namespace yatai
kubectl delete namespace yatai-system

To avoid any confusion, I reinstalled my chart, but renamed the chart to "yatai-paretos" and the release to "myyatai"

deployment                          yatai-operators         1           2022-04-07 10:09:37.710896766 +0000 UTC deployed    yatai-deployment-comp-operator-0.2.0    0.2.0      
myyatai                             yatai-system            1           2022-04-07 12:09:31.950773 +0200 CEST   deployed    yatai-paretos-0.1.0                     0.1.0      
yatai                               yatai-components        1           2022-04-07 10:09:41.45811708 +0000 UTC  deployed    yatai-deployment-operator-0.0.1         0.0.1      
yatai-csi-driver-image-populator    yatai-components        1           2022-04-07 10:09:54.03245874 +0000 UTC  deployed    csi-driver-image-populator-0.1.0        1.16.0     
yatai-docker-registry               yatai-components        1           2022-04-07 10:11:09.589800592 +0000 UTC deployed    docker-registry-1.14.0                  2.7.1      
yatai-ingress-controller            yatai-components        1           2022-04-07 10:09:54.699326189 +0000 UTC deployed    ingress-nginx-4.0.9                     1.0.5      
yatai-minio                         yatai-components        1           2022-04-07 10:10:21.939348367 +0000 UTC deployed    minio-operator-4.3.5                    v4.3.5     

When I log in to the console, I see the new version of the console (I had to go to the /setup URL and pass the initialization token, which was not the case with 0.1.4). So I do have the correct version of Yatai installed, but indeed the value of YATAI_ENDPOINT is http://localhost:7777, which is wrong. Isn't there a bug in the computation of the end point ?

parano commented 2 years ago

@amelki got it, I think that's the reason why the YATAI_ENDPOINT is not set correctly. As @yetone explained, currently yatai deployment CRD controller relies on the Yatai helm chart to get the Yatai URL.

I think it is possible that we decouple them in the future and use a different approach for it to find the Yatai URL. However, as for now, Yatai installation do have 2 assumptions: 1) only one Yatai instance can be installed in a K8s cluster, 2) Yatai must be installed with the official helm chat(or at least the chat name must be "yatai", release name must contain "yatai"). Here's related code for fetching YATAI_ENDPOINT: https://github.com/bentoml/Yatai/blob/30a4b36e8a3ae41942645559ae94a69972c0d73f/api-server/services/yatai_component.go#L96-L181

I think it should work if you rename your own helm chart to just yatai and keep the current release name. But it does also depend what other changes you have made in your own helm chat. May I ask what's the reason that you need your own helm chart? And what are the changes you need? Is it possible for Yatai-chart itself to provide those customization options?

amelki commented 2 years ago

@parano that was it ! Indeed I renamed my chart to yatai and it is working now ! The reason why I created my own chart is that I wanted to fix some of the properties in a values.yml file (such as yatai.ingress.className = ngnix), so that at install time, the minimum set of custom values are passed. It's also because I might have some other charts to install on my cluster, so one helm chart to install them all is always better... Anyway, thanks very much for the help ! Maybe a clearer log somewhere that would explain that the endpoint fallback is chosen could help...

timliubentoml commented 2 years ago

Yea, that totally makes sense! Was googling around a little bit, found this tool that kinda looked interesting for installing multiple helm charts at a time: https://github.com/roboll/helmfile

Maybe it could work for your case