Closed shudhanshh12 closed 2 years ago
Does you cluster have any particular RBAC? It looks like the network call is being blocked.
Also, have you checked the manager pod in seldon-system
is running ok? - it looks so from above log though.
yes, the pod is running fine, also this is the fresh setup, and just created the cluster there is no additional network policy or RBAC applied.
I created the gke cluster and then deployed the Seldon using helm.
helm install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --set istio.enabled=true --set usageMetrics.enabled=true --namespace seldon-system --set crd.create=true --set certManager.enabled=true
What type of cluster are you running on?
Can you check there is a ValidatingWebhookConfguration created and the certificates have been created by certmanager?
Can you maybe try an install without certmanager to see if that works?
It a google managed zonal cluster.
kubectl get ValidatingWebhookConfguration --all-namespaces
error: the server doesn't have a resource type "ValidatingWebhookConfguration"
kubectl get certificates --all-namespaces
NAMESPACE NAME READY SECRET AGE
cert-manager-test selfsigned-cert True selfsigned-cert-tls 9d
That's strange you should see something like:
kubectl get validatingwebhookconfiguration
NAME WEBHOOKS AGE
istiod-istio-system 1 18h
seldon-validating-webhook-configuration-seldon-system 3 18h
Maybe try to uninstall and ensure there are no mutatingwebhookconfiguration
or validatingwebhookconfiguration
left and reinstall?
done
kubectl get validatingwebhookconfiguration
NAME WEBHOOKS AGE
cert-manager-webhook 1 1h
istiod-istio-system 1 9h
nodelimit.config.common-webhooks.networking.gke.io 1 1h
seldon-validating-webhook-configuration-seldon-system 3 1h
validation-webhook.snapshot.storage.k8s.io 1 7d23h
getting the same error:
Error from server (InternalError): error when creating "deployment.yaml": Internal error occurred: failed calling webhook "v1alpha2.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s": dial tcp 10.80.1.4:4443: i/o timeout
@cliveseldon can you please help me to debug this?
kubectl -n seldon-system logs seldon-controller-manager-78fb87cd68-grc9h -p
Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/seldon-controller-manager-78fb87cd68-grc9h. Please use `kubectl.kubernetes.io/default-container` instead
{"level":"error","ts":1621374568.822609,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Get \"https://10.124.16.1:443/api?timeout=32s\": dial tcp 10.124.16.1:443: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.4/pkg/manager/manager.go:279\nmain.main\n\t/workspace/main.go:156\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
{"level":"error","ts":1621374568.822711,"logger":"setup","msg":"unable to start manager","error":"Get \"https://10.124.16.1:443/api?timeout=32s\": dial tcp 10.124.16.1:443: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nmain.main\n\t/workspace/main.go:165\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
Can you try the core_istio ansible playbook on GKE found here: https://github.com/SeldonIO/tempo/tree/master/ansible
I tested with this and all was functioning well on GKE 1.19 cluster today.
no progress,
getting the same issue, by any chance is it related to private cluster?
Error from server (InternalError): error when creating "deployment.yaml": Internal error occurred: failed calling webhook "v1alpha2.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s": context deadline exceeded
also when deployed using ansible I'm not able to see
kubectl get pods -n seldon-system
NAME READY STATUS RESTARTS AGE seldon-controller-manager-cd97b9c85-whdx6 1/1 Running 0 9m25s
OK. What type of cluster are you running. The above was tested on a standard GKE 1.19 cluster.
I am facing the same issue while creating deployment in Seldon. The manager node is running fine but the deployment creation fails with below error: Internal error occurred: failed calling webhook "v1.vseldondeployment.kb.io": Post https://seldon-webhook-service.fusion.svc:443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=30s: context deadline exceeded
I am running it on GKE 1.18 cluster. Does it work on 1.19 only?
Could be related to this: https://github.com/knative/serving/issues/4868
please reopen if still an issue
Error from server (InternalError): error when creating "deployment.yaml": Internal error occurred: failed calling webhook "v1alpha2.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s": context deadline exceeded
Describe the bug
I'm getting issues while deploying the new SeldonDeployment.
To reproduce
istioctl install --set profile=default -y
kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE istio-ingressgateway-5cb85cb9fc-nwb9d 1/1 Running 0 12h istiod-68f469d854-jm7m2 1/1 Running 0 13h
kubectl get gw -n istio-system
NAME AGE seldon-gateway 156m
I have deployed the seldon operator by helm,
helm install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --set istio.enabled=true --set usageMetrics.enabled=true --namespace seldon-system --set crd.create=true --set certManager.enabled=true
kubectl get pods -n seldon-system
NAME READY STATUS RESTARTS AGE seldon-controller-manager-6dbb9fbd87-4rtct 1/1 Running 0 47m
kubectl create -f deployment.yaml -n seldon-system
Error from server (InternalError): error when creating "deployment.yaml": Internal error occurred: failed calling webhook "v1alpha2.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s": dial tcp 10.80.0.14:4443: i/o timeout
Expected behaviour
this should create the deployment
Environment
GKE with istio manually installed
GKE
kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.9-gke.1400", GitCommit:"ec68c7064ea987ad0c7fb63930df96bdefeb93c4", GitTreeState:"clean", BuildDate:"2021-04-07T09:20:04Z", GoVersion:"go1.15.8b5", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.21) and server (1.19) exceeds the supported minor version skew of +/-1
kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio
Model Details
kubectl get deploy -n seldon-system seldon-controller-manager -o yaml
kubectl logs -n seldon-system seldon-controller-manager-6dbb9fbd87-4rtct -f
I0517 10:48:45.397034 1 request.go:621] Throttling request took 1.034903882s, request: GET:https://10.124.16.1:443/apis/batch/v1?timeout=32s {"level":"info","ts":1621248525.9033751,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"} {"level":"info","ts":1621248525.9049253,"logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"machinelearning.seldon.io/v1alpha2, Kind=SeldonDeployment","path":"/mutate-machinelearning-seldon-io-v1alpha2-seldondeployment"} {"level":"info","ts":1621248525.9050074,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-machinelearning-seldon-io-v1alpha2-seldondeployment"} {"level":"info","ts":1621248525.9050608,"logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"machinelearning.seldon.io/v1alpha2, Kind=SeldonDeployment","path":"/validate-machinelearning-seldon-io-v1alpha2-seldondeployment"} {"level":"info","ts":1621248525.9050915,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-machinelearning-seldon-io-v1alpha2-seldondeployment"} {"level":"info","ts":1621248525.905159,"logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"machinelearning.seldon.io/v1alpha3, Kind=SeldonDeployment","path":"/mutate-machinelearning-seldon-io-v1alpha3-seldondeployment"} {"level":"info","ts":1621248525.9052534,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-machinelearning-seldon-io-v1alpha3-seldondeployment"} {"level":"info","ts":1621248525.9052784,"logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"machinelearning.seldon.io/v1alpha3, Kind=SeldonDeployment","path":"/validate-machinelearning-seldon-io-v1alpha3-seldondeployment"} {"level":"info","ts":1621248525.9053478,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-machinelearning-seldon-io-v1alpha3-seldondeployment"} {"level":"info","ts":1621248525.9053905,"logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"machinelearning.seldon.io/v1, Kind=SeldonDeployment","path":"/mutate-machinelearning-seldon-io-v1-seldondeployment"} {"level":"info","ts":1621248525.9054172,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-machinelearning-seldon-io-v1-seldondeployment"} {"level":"info","ts":1621248525.905451,"logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"machinelearning.seldon.io/v1, Kind=SeldonDeployment","path":"/validate-machinelearning-seldon-io-v1-seldondeployment"} {"level":"info","ts":1621248525.9054766,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-machinelearning-seldon-io-v1-seldondeployment"} {"level":"info","ts":1621248525.9055269,"logger":"setup","msg":"starting manager"} I0517 10:48:45.905932 1 leaderelection.go:242] attempting to acquire leader lease seldon-system/a33bd623.machinelearning.seldon.io... {"level":"info","ts":1621248526.006426,"logger":"controller-runtime.webhook.webhooks","msg":"starting webhook server"} {"level":"info","ts":1621248526.006426,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} {"level":"info","ts":1621248526.0068915,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"} {"level":"info","ts":1621248526.0070863,"logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":4443} {"level":"info","ts":1621248526.0071626,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"} I0517 10:49:03.498490 1 leaderelection.go:252] successfully acquired lease seldon-system/a33bd623.machinelearning.seldon.io {"level":"info","ts":1621248543.4987247,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager","source":"kind source: /, Kind="} {"level":"info","ts":1621248544.299195,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager","source":"kind source: /, Kind="} {"level":"info","ts":1621248544.299298,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager","source":"kind source: /, Kind="} {"level":"info","ts":1621248544.2995644,"logger":"controller","msg":"Starting EventSource","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager","source":"kind source: /, Kind="} {"level":"info","ts":1621248544.2995968,"logger":"controller","msg":"Starting Controller","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager"} {"level":"info","ts":1621248544.2996142,"logger":"controller","msg":"Starting workers","reconcilerGroup":"machinelearning.seldon.io","reconcilerKind":"SeldonDeployment","controller":"seldon-controller-manager","worker count":1}