Closed aquynh1682 closed 1 year ago
I just checked again, and I realized that the issue is not with the Python file itself. It seems to be related to the subprocess module. It is unable to execute the Python file.
From my investigation, it does seem that the issue lies in this Python file itself. But how can I determine what exactly is causing the error?
Hi, @aquynh1682 Can you execute?
$ kubectl get pods -n ml-workshop
I'm just asking and i think i will not help.
Hi, @webmakaka
Thank you for asking. Here is the result when running the command: $ kubectl get pods -n ml-workshop
Did you change original manifests? Or everything works from the box? (I had issues last time)
=============================
Did you create empy your airflow-dags repo with branch main? And did you generate TOKEN to push dags to github?
========= May be my notes could be helpful. https://github.com/webmakaka/Machine-Learning-on-Kubernetes/blob/main/docs/07-model-deployment-and-automation.md
Hi @webmakaka,
Yes, I have modified the manifests so that it can work properly now. And where are you issues the problem?
===========
Yes, I have completed all those steps and I can see that it's still working fine. Currently, I suspect that the error might be due to the inability to push images to my registry using Kaniko.
Image github àilow-dags repo with branch main:
Image airflow sync to github:
==============
I have checked, and currently, it is not helpful to me either. :)))
ok! Can you share your manifests and information about your versions of kubernetes, operator-lifecycle-manager?
I had issue with running airflow.
https://github.com/PacktPublishing/Machine-Learning-on-Kubernetes/issues/10#issuecomment-1397792510
I am using kubernetes version 1.26.6 and olm version v0.20.0.
Oh before, I also encountered a similar error. I tried to check, and it showed an error connecting to the database. Please try running this command to see what it reports: kubectl logs -f -n ml-workshop <name pod> -c <container name>
.
As I just tried running it locally before running it on Airflow, I found that it reported an error of missing the config.json file inside the /kaniko/.docker/config.json
folder. And when I tried accessing the images quay.io/ml-on-k8s/kaniko-container-builder:1.0.0
, ironically, it didn't have the necessary files to run (there wasn't even a Dockerfile in /workspace). However, in the build_push_image.py
Python file, it demands those files, lol :)))). I think the Airflow part of Chapter 7 should be temporarily skipped until a more reliable image version is available (or maybe never).
Images not find config.json and dockerfile:
Images file python open file config.json and dockerfile:
That's right.
Maybe you should check this file in? quay.io/ml-on-k8s/kaniko-container-builder:1.0.0
Due to my incomplete screenshots, I used the image quay.io/ml-on-k8s/kaniko-container-builder:1.0.0
and it was completely missing.
I think /kaniko/.docker/config.jso file will be created on auth to your registry. My local config.json, for example
You specify some creds in Environment Variables
MODEL_NAME=mlflowdemo
MODEL_VERSION=1
CONTAINER_REGISTRY=https://index.docker.io/v1/
CONTAINER_REGISTRY_USER=username
CONTAINER_REGISTRY_PASSWORD=mypassword
CONTAINER_DETAILS=webmakaka/mlflowdemo:latest
Everything worked a year ago for this chapter. May be there was some mistakes in the book (I do not remember) and everything will work after reading next chapter.
Hmm, I will try to run it again and investigate why I encountered the error. Thank you very much.
I just made a discovery that when running a Python file error, the log will be pushed to Minio. After having this file, I was able to complete Chapter 9. However, Chapter 7 is similar, but it has more errors that I'm quite lazy to fix. So, if anyone has completed Chapter 7 completely, please guide me through it. :))))
Hi, @aquynh1682
Can you show information about opendatahub-operator from your stand?
I have issue with it. It updates on versions with errors.
$ kubectl get pods -n operators
NAME READY STATUS RESTARTS AGE
opendatahub-operator-controller-manager-79f79b7b5f-9vv9d 1/2 CrashLoopBackOff 6 (46s ago) 9m33s
$ kubectl logs opendatahub-operator-controller-manager-79f79b7b5f-9vv9d -n operators
2023-06-30T10:17:59.483Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2023-06-30T10:17:59.483Z INFO controllers.KfDef Adding controller for kfdef.
2023-06-30T10:17:59.484Z INFO secret-generator Adding controller for Secret Generation.
2023-06-30T10:17:59.484Z INFO setup starting manager
I0630 10:17:59.484476 1 leaderelection.go:243] attempting to acquire leader lease operators/kfdef-controller...
2023-06-30T10:17:59.484Z INFO starting metrics server {"path": "/metrics"}
I0630 10:18:15.245800 1 leaderelection.go:253] successfully acquired lease operators/kfdef-controller
2023-06-30T10:18:15.246Z INFO controller.secret-generator-controller Starting EventSource {"reconciler group": "", "reconciler kind": "Secret", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.secret-generator-controller Starting EventSource {"reconciler group": "", "reconciler kind": "Secret", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.secret-generator-controller Starting Controller {"reconciler group": "", "reconciler kind": "Secret"}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.245Z DEBUG events Normal {"object": {"kind":"ConfigMap","namespace":"operators","name":"kfdef-controller","uid":"f0fac8fa-4795-4030-89da-b1a5b598b3ac","apiVersion":"v1","resourceVersion":"10172"}, "reason": "LeaderElection", "message": "opendatahub-operator-controller-manager-79f79b7b5f-9vv9d_015da768-d792-4f12-89a2-0a7a8ddf06ec became leader"}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z DEBUG events Normal {"object": {"kind":"Lease","namespace":"operators","name":"kfdef-controller","uid":"573eae8d-c509-4a62-ad40-f2492672abd9","apiVersion":"coordination.k8s.io/v1","resourceVersion":"10173"}, "reason": "LeaderElection", "message": "opendatahub-operator-controller-manager-79f79b7b5f-9vv9d_015da768-d792-4f12-89a2-0a7a8ddf06ec became leader"}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting EventSource {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "source": "kind source: /, Kind="}
2023-06-30T10:18:15.246Z INFO controller.kfdef-controller Starting Controller {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef"}
2023-06-30T10:18:16.106Z ERROR controller-runtime.source if kind is a CRD, it should be installed before calling Start {"kind": "BuildConfig.build.openshift.io", "error": "no matches for kind \"BuildConfig\" in version \"build.openshift.io/v1\""}
2023-06-30T10:18:16.106Z INFO controller.secret-generator-controller Starting workers {"reconciler group": "", "reconciler kind": "Secret", "worker count": 1}
2023-06-30T10:18:16.106Z INFO controllers.KfDef Watch a change for KfDef CR {"instance": "opendatahub-ml-workshop", "namespace": "ml-workshop"}
I0630 10:18:17.252216 1 request.go:668] Waited for 1.047053266s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1beta1?timeout=32s
2023-06-30T10:18:18.455Z ERROR controller-runtime.source if kind is a CRD, it should be installed before calling Start {"kind": "DeploymentConfig.apps.openshift.io", "error": "no matches for kind \"DeploymentConfig\" in version \"apps.openshift.io/v1\""}
2023-06-30T10:18:18.455Z ERROR controller.kfdef-controller Could not wait for Cache to sync {"reconciler group": "kfdef.apps.kubeflow.org", "reconciler kind": "KfDef", "error": "failed to wait for kfdef-controller caches to sync: no matches for kind \"DeploymentConfig\" in version \"apps.openshift.io/v1\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:234
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/manager/internal.go:696
2023-06-30T10:18:18.455Z INFO controller.secret-generator-controller Shutdown signal received, waiting for all workers to finish {"reconciler group": "", "reconciler kind": "Secret"}
2023-06-30T10:18:18.455Z INFO controller.secret-generator-controller All workers finished {"reconciler group": "", "reconciler kind": "Secret"}
2023-06-30T10:18:18.455Z ERROR setup problem running manager {"error": "failed to wait for kfdef-controller caches to sync: no matches for kind \"DeploymentConfig\" in version \"apps.openshift.io/v1\""}
runtime.goexit
/usr/lib/golang/src/runtime/asm_amd64.s:1571
2023-06-30T10:18:18.455Z ERROR error received after stop sequence was engaged {"error": "leader election lost"}
runtime.goexit
/usr/lib/golang/src/runtime/asm_amd64.s:1571
Hi, @webmakaka
I used opendatahub-operator.v.1.1.1
My trick is first setting it up to run in automation mode, and then quickly changing it back to manual mode so that it functions normally =)))).
Hi, @aquynh1682 !
I completed this step.
To pass this step, I edited build_push_image.py and manually specified:
access_key="minio",
secret_key="minio123",
and from minio in mlflow bucket i take.
data_file_model = minioClient.fget_object("mlflow", f"/2/726460da00bc4bedb7f70f20e08bc3b3/artifacts/model/model.pkl", "model.pkl")
data_file_requirements = minioClient.fget_object("mlflow", f"/2/726460da00bc4bedb7f70f20e08bc3b3/artifacts/model/requirements.txt", "requirements.txt")
I think you can manually specify your parameters, at least MODEL_NAME in line 27
Hi, @webmakaka
I apologize for the delayed response. I am currently facing another issue related to kaniko. I am quite certain that I am doing everything correctly, but could you please let me know which registry you are using and ít API endpoint when you fill in the Jupiter Hub?
Log error push image to registry:
{"auths":{"https://hub.docker.com/v2":{"auth":"cXV5bmhuZ28xMTM6UXV5bmhscDEyMzQ1NmFA"}}}
retrieving model metadata from mlflow...
<RegisteredModel: creation_timestamp=1688097597263, description='', last_updated_timestamp=1688097632383, latest_versions=[<ModelVersion: creation_timestamp=1688097632383, current_stage='None', description='', last_updated_timestamp=1688097632383, name='mlflowdemo', run_id='42b2e5c665864ab48f7979ded26673f5', run_link='', source='s3://mlflow/1/42b2e5c665864ab48f7979ded26673f5/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>], name='mlflowdemo', tags={}>
initializing connection to s3 server...
download successful
/workspace/jupyter-work-dir
/kaniko/executor --context=/workspace/jupyter-work-dir --dockerfile=Dockerfile --verbosity=debug --cache=true --single-snapshot=true --destination=https://hub.docker.com/v2/quynhngo113/mlflowdemo:latest
===============
b'error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "https://hub.docker.com/v2/quynhngo113/mlflowdemo:latest": creating push check transport for https: failed: Get "https://https/v2/": dial tcp: lookup https on 169.254.25.10:53: no such host\n'
===============
b'\x1b[37mDEBU\x1b[0m[0000] Copying file /workspace/jupyter-work-dir/Dockerfile to /kaniko/Dockerfile \n'
My Runtimes is
Name: MyAirflow
Description: MyAirflow
Apache Airflow UI Endpoint: https://airflow.192.168.49.2.nip.io
Apache Airflow User Namespace: ml-workshop
Github API Endpoint: https://api.github.com
GitHub DAG Repository: wildmakaka/airflow-dags
GitHub DAG Repository Branch: main
Github Personal Access Token: [YOUR_GITHUB_TOKEN]
Cloud Object Storage Endpoint: http://minio-ml-workshop:9000
Cloud Object Storage Credential Secret: [empty]
Cloud Object Storage Username: minio
Cloud Object Storage Password: minio123
Cloud Object Storage Bucket Name: airflow
I think, your final value https://hub.docker.com/v2/quynhngo113/mlflowdemo:latest is incorrect.
My Runtimes it completely normal.
I was just asking about the declaration of variable passed into the build_push_image.py file.
Looks correct! May be there are problems with passing values in the script? Can you manually set values in the build_push_image.py?
I have problems with runnung spark scripts. Old problem returns.
I also encountered a similar error, and I just ignored it :)))
I was manually set values in the build_push_image.py but it still gives an error.
This is the build_push_image.py:
import string
import subprocess
import os
import base64
import mlflow
from minio import Minio
from mlflow.tracking import MlflowClient
"""
This script assumes that the /kaniko/.docker/config.json has the correct repo and associated credentials mounted
It also expects the these env variables has been set
CONTAINER_REGISTRY is the resitry server like quay.io
CONTAINER_DETAILS is the container coordinates like ml-on-k8s/containermodel:1.0.0
AWS_SECRET_ACCESS_KEY is the password for the S3 store
MODEL_NAME is hte name of the model in mlflow
MODEL_VERSION is the version of the model in mlflow
"""
os.environ['MLFLOW_S3_ENDPOINT_URL']='http://minio-ml-workshop:9000'
os.environ['AWS_ACCESS_KEY_ID']='minio'
os.environ['AWS_REGION']='us-east-1'
os.environ['AWS_BUCKET_NAME']='mlflow'
HOST = "http://mlflow:5500"
model_name = "mlflowdemo"
model_version = 1
build_name = f"seldon-model-{model_name}-v{model_version}"
auth_encoded = string.Template(":").substitute(os.environ)
os.environ["CONTAINER_REGISTRY_CREDS"] = base64.b64encode(auth_encoded.encode("ascii")).decode("ascii")
docker_auth = string.Template('{"auths":{"https://quay.io/api/v1":{"auth":"$CONTAINER_REGISTRY_CREDS"}}}').substitute(os.environ)
print(docker_auth)
f = open("/kaniko/.docker/config.json", "w")
f.write(docker_auth)
f.close()
def get_s3_server():
minioClient = Minio('minio-ml-workshop:9000',
# access_key=os.environ['AWS_ACCESS_KEY_ID'],
# secret_key=os.environ["AWS_SECRET_ACCESS_KEY"],
access_key="minio",
secret_key="minio123",
secure=False)
return minioClient
def init():
mlflow.set_tracking_uri(HOST)
def download_artifacts():
print("retrieving model metadata from mlflow...")
# model = mlflow.pyfunc.load_model(
# model_uri=f"models:/{model_name}/{model_version}"
# )
client = MlflowClient()
model = client.get_registered_model(model_name)
print(model)
run_id = model._latest_version[0].run_id
source = model._latest_version[0].source
experiment_id = "1" # to be calculated from the source which is source='s3://mlflow/1/bf721e5641394ed6866baf20131fca20/artifacts/model'
print("initializing connection to s3 server...")
minioClient = get_s3_server()
# artifact_location = mlflow.get_experiment_by_name('rossdemo').artifact_location
# print("downloading artifacts from s3 bucket " + artifact_location)
data_file_model = minioClient.fget_object("mlflow", f"/{experiment_id}/{run_id}/artifacts/model/model.pkl", "model.pkl")
data_file_requirements = minioClient.fget_object("mlflow", f"/{experiment_id}/{run_id}/artifacts/model/requirements.txt", "requirements.txt")
#Using boto3 Download the files from mlflow, the file path is in the model meta
#write the files to the file system
print("download successful")
return run_id
def build_push_image():
container_location = string.Template("https://quay.io/api/v1/quynhngo113/mlflowdemo:1.1.0").substitute(os.environ)
#For docker repo, do not include the registry domain name in container location
# if os.environ["https://hub.docker.com/v2"].find("docker.io") != -1:
# container_location= os.environ["quynhngo113/mlflowdemo:1.1.0"]
# print(os.getcwd())
full_command = "/kaniko/executor --context=" + os.getcwd() + " --dockerfile=Dockerfile --verbosity=debug --cache=true --single-snapshot=true --destination=" + container_location
print(full_command)
process = subprocess.run(full_command, shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print("===============")
print(process.stdout)
print("===============")
print(process.stderr)
# print(subprocess.check_output(['/kaniko/executor', '--context', '/workspace', '--dockerfile', 'Dockerfile', '--destination', container_location]))
init()
download_artifacts()
build_push_image()
This error:
{"auths":{"https://quay.io/api/v1":{"auth":"cXV5bmhuZ28xMTM6UXV5bmhscDEyMzQ1NmFA"}}}
retrieving model metadata from mlflow...
<RegisteredModel: creation_timestamp=1688097597263, description='', last_updated_timestamp=1688097632383, latest_versions=[<ModelVersion: creation_timestamp=1688097632383, current_stage='None', description='', last_updated_timestamp=1688097632383, name='mlflowdemo', run_id='42b2e5c665864ab48f7979ded26673f5', run_link='', source='s3://mlflow/1/42b2e5c665864ab48f7979ded26673f5/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>], name='mlflowdemo', tags={}>
initializing connection to s3 server...
download successful
/kaniko/executor --context=/workspace/jupyter-work-dir --dockerfile=Dockerfile --verbosity=debug --cache=true --single-snapshot=true --destination=https://quay.io/api/v1/quynhngo113/mlflowdemo:1.1.0
===============
b'error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "https://quay.io/api/v1/quynhngo113/mlflowdemo:1.1.0": creating push check transport for https: failed: Get "https://https/v2/": dial tcp: lookup https on 169.254.25.10:53: no such host\n'
===============
b'\x1b[37mDEBU\x1b[0m[0000] Copying file /workspace/jupyter-work-dir/Dockerfile to /kaniko/Dockerfile \n'
Hi, @aquynh1682! Did you finish reading this book?
I have issue with load data in grafana in chapter 10. May be you can recommend something?
H, @webmakaka!
I have finished reading the book, and there are a few things that made me skip :)).
As for Grafana not being able to fetch metrics, I have read the file manifests/prometheus/base/prometheus.yaml
. So please find the spec.Selector.MatchLabels
section in the ServiceMonitor part. You will see it has the label app.kubernetes.io/managed-by: seldon-core
. Earlier, I tried to find it, but my Seldon was not there, I think it's because I skipped it before the Grafana part. So it couldn't start the images built from that part. Please try running this part before Grafana and see if the started pod of the Seldon app has something related to seldon-core. (I haven't had a chance to test this part, my Kubernetes cluster got deleted lol :)))).
Wish you can conquer the last part of the book 🫡
Hey @aquynh1682 and @webmakaka, Have you guys been able to figure out your issues, if yes, can I close this thread?
If you guys still need help with something, please share the details of the error you are facing, and I will try to reach out to the author for assistance. You can create a separate issue threads too in case of multiple issues.
Hi, @rajat-packt ! I have no issue with this chapter. Can you ask author to share sources for images. I want try to build images by myself.
Hi, @rajat-packt,
I have an issue in chapters 7 and 10. From what I can see, they both have the same error related to Kaniko. Here is the error log file when running it with the Python file build_push_images.py
. Please contact me if you need any additional information to help fix this issue. Thank you very much.
And here is the content of that Python file.
import string
import subprocess
import os
import base64
import mlflow
from minio import Minio
from mlflow.tracking import MlflowClient
"""
This script assumes that the /kaniko/.docker/config.json has the correct repo and associated credentials mounted
It also expects the these env variables has been set
CONTAINER_REGISTRY is the resitry server like quay.io
CONTAINER_DETAILS is the container coordinates like ml-on-k8s/containermodel:1.0.0
AWS_SECRET_ACCESS_KEY is the password for the S3 store
MODEL_NAME is hte name of the model in mlflow
MODEL_VERSION is the version of the model in mlflow
"""
os.environ['MLFLOW_S3_ENDPOINT_URL']='http://minio-ml-workshop:9000'
os.environ['AWS_ACCESS_KEY_ID']='minio'
os.environ['AWS_REGION']='us-east-1'
os.environ['AWS_BUCKET_NAME']='mlflow'
HOST = "http://mlflow:5500"
model_name = os.environ["MODEL_NAME"]
model_version = os.environ["MODEL_VERSION"]
build_name = f"seldon-model-{model_name}-v{model_version}"
auth_encoded = string.Template("$CONTAINER_REGISTRY_USER:$CONTAINER_REGISTRY_PASSWORD").substitute(os.environ)
os.environ["CONTAINER_REGISTRY_CREDS"] = base64.b64encode(auth_encoded.encode("ascii")).decode("ascii")
docker_auth = string.Template('{"auths":{"$CONTAINER_REGISTRY":{"auth":"$CONTAINER_REGISTRY_CREDS"}}}').substitute(os.environ)
print(docker_auth)
f = open("/kaniko/.docker/config.json", "w")
f.write(docker_auth)
f.close()
def get_s3_server():
minioClient = Minio('minio-ml-workshop:9000',
access_key=os.environ['AWS_ACCESS_KEY_ID'],
secret_key=os.environ["AWS_SECRET_ACCESS_KEY"],
secure=False)
return minioClient
def init():
mlflow.set_tracking_uri(HOST)
def download_artifacts():
print("retrieving model metadata from mlflow...")
# model = mlflow.pyfunc.load_model(
# model_uri=f"models:/{model_name}/{model_version}"
# )
client = MlflowClient()
model = client.get_registered_model(model_name)
print(model)
run_id = model._latest_version[0].run_id
source = model._latest_version[0].source
experiment_id = "1" # to be calculated from the source which is source='s3://mlflow/1/bf721e5641394ed6866baf20131fca20/artifacts/model'
print("initializing connection to s3 server...")
minioClient = get_s3_server()
# artifact_location = mlflow.get_experiment_by_name('rossdemo').artifact_location
# print("downloading artifacts from s3 bucket " + artifact_location)
data_file_model = minioClient.fget_object("mlflow", f"/{experiment_id}/{run_id}/artifacts/model/model.pkl", "model.pkl")
data_file_requirements = minioClient.fget_object("mlflow", f"/{experiment_id}/{run_id}/artifacts/model/requirements.txt", "requirements.txt")
#Using boto3 Download the files from mlflow, the file path is in the model meta
#write the files to the file system
print("download successful")
return run_id
def build_push_image():
container_location = string.Template("$CONTAINER_REGISTRY/$CONTAINER_DETAILS").substitute(os.environ)
#For docker repo, do not include the registry domain name in container location
if os.environ["CONTAINER_REGISTRY"].find("docker.io") != -1:
container_location= os.environ["CONTAINER_DETAILS"]
full_command = "/kaniko/executor --context=" + os.getcwd() + " --dockerfile=Dockerfile --verbosity=debug --cache=true --single-snapshot=true --destination=" + container_location
print(full_command)
process = subprocess.run(full_command, shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(process.stdout)
print(process.stderr)
# print(subprocess.check_output(['/kaniko/executor', '--context', '/workspace', '--dockerfile', 'Dockerfile', '--destination', container_location]))
init()
download_artifacts()
build_push_image()
Hey @aquynh1682, sorry we couldn't provide any assistance as we weren't able get any response from the author, at the moment. I hope you were able to find a solution for this issue.
Hi @rajat-packt, thanks for trying to help me. I have resolved all the issue that i have encountered.
Platform:
Kubespray version 1.26.6
Wishing you all a good day, I am following along in your documentation, up to page 210.
And I'm encountering an error while building on Airflow, specifically in the build_push_image step. From what I can see, it just finished executing the step to install the
requirements.txt
. And then I encountered thiserror: subprocess.CalledProcessError: Command '['python3', 'build_push_image.py']' returned non-zero exit status 1.
(I don't understand why Airflow treats this as an info, but when I check the pod logs, it shows this as an error, lol :))). Please help me investigate this error. Below is the full log and the structure inside the build_push_image.py file. Thank you very much.Log in pod:
log in airflow:
file python build_push_image.py: