canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

tls: failed to verify certificate: x509: certificate signed by unknown authority" #1095

Open ShrishtiKarkera opened 1 month ago

ShrishtiKarkera commented 1 month ago

Bug Description

I'm unable to use kserve inferenceservice using the JupyterLab notebook, when I create an inference client, it throws this error: "inferenceservice.kserve-webhook-server.defaulter\": failed to call webhook: Post \"https://kserve-webhook-server-service.kubeflow.svc:443/mutate-serving-kserve-io-v1beta1-inferenceservice?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

Inference service client looks like this and my model is stored in minio:

from datetime import datetime
from kserve import KServeClient, constants
from kserve.models import (
    V1beta1InferenceService,
    V1beta1InferenceServiceSpec,
    V1beta1PredictorSpec,
    V1beta1SKLearnSpec
)
from kubernetes import client
import utils

# Get the default target namespace
namespace = "admin"

now = datetime.now()
v = now.strftime("%Y-%m-%d--%H-%M-%S")

name = 'iris-classifier'
kserve_version = 'v1beta1'
api_version = constants.KSERVE_GROUP + '/' + kserve_version

# Create the InferenceService
isvc = V1beta1InferenceService(
    api_version=api_version,
    kind=constants.KSERVE_KIND,
    metadata=client.V1ObjectMeta(
        name=name, 
        namespace=namespace, 
        annotations={'sidecar.istio.io/inject': 'false'}
    ),
    spec=V1beta1InferenceServiceSpec(
        predictor=V1beta1PredictorSpec(
            service_account_name="sa-minio-kserve",
            sklearn=V1beta1SKLearnSpec(
                storage_uri="s3://mlpipeline/models/iris_model.pkl"
            )
        )
    )
)

# Create the InferenceService in KServe
KServe = KServeClient()
KServe.create(isvc)

I checked the certs and found everything to be in place, I also tried restarting the mutatingwebhookconfiguration but didn't help.

To Reproduce

  1. Deploy Charmed Kubeflow - https://charmed-kubeflow.io/docs/get-started-with-charmed-kubeflow
  2. Allow minio access - https://charmed-kubeflow.io/docs/allow-access-minio
  3. Allow Kserve to access minio
  4. Launch a new notebook (scipy image)
  5. Execute the following code Note: I have the model in minio bucket: mlpipeline (upload the model.pkl file)
pip install minio boto3 mlflow
import pandas as pd
import os
from sklearn import datasets
from minio import Minio
# Load dataset
iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

df = df.dropna()
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import os
target_column = 'species'
X = df.loc[:, df.columns != target_column]
y = df.loc[:, df.columns == target_column]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,stratify = y, random_state=47)
from sklearn.linear_model import LogisticRegression
import joblib
iris_model = LogisticRegression(max_iter=200)
iris_model.fit(X_train,y_train)
joblib.dump(iris_model, 'iris_model.pkl')
from datetime import datetime
from kserve import KServeClient, constants
from kserve.models import (
    V1beta1InferenceService,
    V1beta1InferenceServiceSpec,
    V1beta1PredictorSpec,
    V1beta1SKLearnSpec
)
from kubernetes import client
import utils

# Get the default target namespace
namespace = "admin"

now = datetime.now()
v = now.strftime("%Y-%m-%d--%H-%M-%S")

name = 'iris-classifier'
kserve_version = 'v1beta1'
api_version = constants.KSERVE_GROUP + '/' + kserve_version

# Create the InferenceService
isvc = V1beta1InferenceService(
    api_version=api_version,
    kind=constants.KSERVE_KIND,
    metadata=client.V1ObjectMeta(
        name=name, 
        namespace=namespace, 
        annotations={'sidecar.istio.io/inject': 'false'}
    ),
    spec=V1beta1InferenceServiceSpec(
        predictor=V1beta1PredictorSpec(
            service_account_name="sa-minio-kserve",
            sklearn=V1beta1SKLearnSpec(
                storage_uri="s3://mlpipeline/models/iris_model.pkl"
            )
        )
    )
)

# Create the InferenceService in KServe
KServe = KServeClient()
KServe.create(isvc)

Environment

AWS t3x2 large instance with 10 gbs of storage Installed Charmed Kubeflow, minio and mlflow Allowed minio access and mlflow access

Relevant Log Output

Post \"https://kserve-webhook-server-service.kubeflow.svc:443/mutate-serving-kserve-io-v1beta1-inferenceservice?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":"2024-09-30T20:50:29Z","msg":"Reconciler error","controller":"inferenceservice","controllerGroup":"serving.kserve.io","controllerKind":"InferenceService","InferenceService":{"name":"iris-classifier","namespace":"admin"},"namespace":"admin","name":"iris-classifier","reconcileID":"57908d65-5f49-4305-ade5-3247160b89ec","error":"Internal error occurred: failed calling webhook \"inferenceservice.kserve-webhook-server.defaulter\": failed to call webhook: Post \"https://kserve-webhook-server-service.kubeflow.svc:443/mutate-serving-kserve-io-v1beta1-inferenceservice?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}

Additional Context

No response

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6340.

This message was autogenerated

NohaIhab commented 3 weeks ago

Hi @ShrishtiKarkera From the logs it looks like there's an issue with verifying certificate the webhook of KServe in the MutatingWebhookConfiguration object. To debug this further, first we need to check the health of the admission webhook charm and workload. Can you share:

  1. The logs of admission webhook charm by running:
    juju debug-log --replay --include unit-admission-webhook-0
  2. The logs of admission webhook workload by running:
    kubectl logs -n kubeflow admission-webhook-0 -c admission-webhook