Closed NohaIhab closed 1 month ago
During working on https://github.com/canonical/bundle-kubeflow/issues/1077, I came across this issue with the kserve agent ROCK. It is a specific case where the InferenceService creates the agent container and tries to pass arguments to it. This is the same issue we were facing in https://github.com/canonical/katib-rocks/issues/49. The InferenceService Pod description is:
agent
kubectl describe po llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk Name: llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk Namespace: admin Priority: 0 Service Account: default Node: ip-172-31-7-11/172.31.7.11 Start Time: Wed, 16 Oct 2024 07:47:26 +0000 Labels: app=llama3-8b-instruct-1xgpu-predictor-00001 component=predictor pod-template-hash=c75859f7f service.istio.io/canonical-name=llama3-8b-instruct-1xgpu-predictor service.istio.io/canonical-revision=llama3-8b-instruct-1xgpu-predictor-00001 serving.knative.dev/configuration=llama3-8b-instruct-1xgpu-predictor serving.knative.dev/configurationGeneration=1 serving.knative.dev/configurationUID=1b060239-b43b-4cdf-aeb4-0209df82f26e serving.knative.dev/revision=llama3-8b-instruct-1xgpu-predictor-00001 serving.knative.dev/revisionUID=fb07d59d-9ded-48b5-b691-ac774b1b0cfe serving.knative.dev/service=llama3-8b-instruct-1xgpu-predictor serving.knative.dev/serviceUID=e426f02f-53d4-4cef-ae21-6ec406751c98 serving.kserve.io/inferenceservice=llama3-8b-instruct-1xgpu Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev autoscaling.knative.dev/min-scale: 1 autoscaling.knative.dev/target: 10 cni.projectcalico.org/containerID: 0250b076c9c02e5c09d0b6c52bc8418cacd1565db3a2fa1d0472c94468dcbe81 cni.projectcalico.org/podIP: 10.1.32.188/32 cni.projectcalico.org/podIPs: 10.1.32.188/32 internal.serving.kserve.io/agent: true internal.serving.kserve.io/configMountPath: /mnt/configs internal.serving.kserve.io/configVolumeName: modelconfig-llama3-8b-instruct-1xgpu-0 internal.serving.kserve.io/modelDir: /mnt/models prometheus.io/path: /metrics prometheus.io/port: 9088 prometheus.kserve.io/path: /metrics prometheus.kserve.io/port: 8000 serving.knative.dev/creator: system:serviceaccount:kubeflow:kserve-controller serving.kserve.io/enable-metric-aggregation: true serving.kserve.io/enable-prometheus-scraping: true sidecar.istio.io/inject: false Status: Running IP: 10.1.32.188 IPs: IP: 10.1.32.188 Controlled By: ReplicaSet/llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859f7f Containers: kserve-container: Container ID: containerd://27b50f3895866f88f327ffec54335426355c800abc0064e6d3fe7e031fc5f71e Image: nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 Image ID: nvcr.io/nim/meta/llama3-8b-instruct@sha256:7fe6071923b547edd9fba87c891a362ea0b4a88794b8a422d63127e54caa6ef7 Port: 8000/TCP Host Port: 0/TCP State: Running Started: Wed, 16 Oct 2024 07:47:26 +0000 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 16Gi nvidia.com/gpu: 1 Requests: cpu: 1 memory: 16Gi nvidia.com/gpu: 1 Environment: NIM_CACHE_PATH: /tmp NGC_API_KEY: <set to the key 'NGC_API_KEY' in secret 'ngc-nim-secret'> Optional: false PORT: 8000 K_REVISION: llama3-8b-instruct-1xgpu-predictor-00001 K_CONFIGURATION: llama3-8b-instruct-1xgpu-predictor K_SERVICE: llama3-8b-instruct-1xgpu-predictor Mounts: /dev/shm from dshm (rw) /mnt/models from model-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f9995 (ro) queue-proxy: Container ID: containerd://7d020e01976ddd4eaf77b385c08cb7f2a87b2ad8b838e7796c216dfd66f36eba Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:89e6f90141f1b63405883fbb4de0d3b6d80f8b77e530904c4d29bdcd1dc5a167 Image ID: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:89e6f90141f1b63405883fbb4de0d3b6d80f8b77e530904c4d29bdcd1dc5a167 Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP, 8112/TCP, 9088/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP SeccompProfile: RuntimeDefault State: Running Started: Wed, 16 Oct 2024 07:47:27 +0000 Ready: False Restart Count: 0 Requests: cpu: 25m Readiness: http-get http://:8012/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: SERVING_NAMESPACE: admin SERVING_SERVICE: llama3-8b-instruct-1xgpu-predictor SERVING_CONFIGURATION: llama3-8b-instruct-1xgpu-predictor SERVING_REVISION: llama3-8b-instruct-1xgpu-predictor-00001 QUEUE_SERVING_PORT: 8012 QUEUE_SERVING_TLS_PORT: 8112 CONTAINER_CONCURRENCY: 0 REVISION_TIMEOUT_SECONDS: 300 REVISION_RESPONSE_START_TIMEOUT_SECONDS: 0 REVISION_IDLE_TIMEOUT_SECONDS: 0 SERVING_POD: llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk (v1:metadata.name) SERVING_POD_IP: (v1:status.podIP) SERVING_LOGGING_CONFIG: SERVING_LOGGING_LEVEL: SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"} SERVING_ENABLE_REQUEST_LOG: false SERVING_REQUEST_METRICS_BACKEND: prometheus SERVING_REQUEST_METRICS_REPORTING_PERIOD_SECONDS: 5 TRACING_CONFIG_BACKEND: none TRACING_CONFIG_ZIPKIN_ENDPOINT: TRACING_CONFIG_DEBUG: false TRACING_CONFIG_SAMPLE_RATE: 0.1 USER_PORT: 9081 SYSTEM_NAMESPACE: knative-serving METRICS_DOMAIN: knative.dev/internal/serving SERVING_READINESS_PROBE: {"tcpSocket":{"port":8000,"host":"127.0.0.1"},"successThreshold":1} ENABLE_PROFILING: false SERVING_ENABLE_PROBE_REQUEST_LOG: false METRICS_COLLECTOR_ADDRESS: HOST_IP: (v1:status.hostIP) ENABLE_HTTP2_AUTO_DETECTION: false ROOT_CA: KSERVE_CONTAINER_PROMETHEUS_METRICS_PORT: 8000 KSERVE_CONTAINER_PROMETHEUS_METRICS_PATH: /metrics AGGREGATE_PROMETHEUS_METRICS_PORT: 9088 KSERVE_CONTAINER_PROMETHEUS_METRICS_PORT: 8000 KSERVE_CONTAINER_PROMETHEUS_METRICS_PATH: /metrics AGGREGATE_PROMETHEUS_METRICS_PORT: 9088 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f9995 (ro) agent: Container ID: containerd://c58427baedded1130cec91d74e203c88ef919f62819d665ee1b61171158eb947 Image: charmedkubeflow/kserve-agent:0.13.0-17792da Image ID: docker.io/charmedkubeflow/kserve-agent@sha256:00825a7816ffffcbb1b262d4f47004182f788357ea0c5af14d1ae1d4a26620d1 Port: 9081/TCP Host Port: 0/TCP Args: --enable-puller --config-dir /mnt/configs --model-dir /mnt/models --component-port 8000 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 16 Oct 2024 07:47:49 +0000 Finished: Wed, 16 Oct 2024 07:47:49 +0000 Ready: False Restart Count: 2 Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 100Mi Readiness: http-get http://:9081/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: SERVING_NAMESPACE: admin SERVING_SERVICE: llama3-8b-instruct-1xgpu-predictor SERVING_CONFIGURATION: llama3-8b-instruct-1xgpu-predictor SERVING_REVISION: llama3-8b-instruct-1xgpu-predictor-00001 QUEUE_SERVING_PORT: 8012 QUEUE_SERVING_TLS_PORT: 8112 CONTAINER_CONCURRENCY: 0 REVISION_TIMEOUT_SECONDS: 300 REVISION_RESPONSE_START_TIMEOUT_SECONDS: 0 REVISION_IDLE_TIMEOUT_SECONDS: 0 SERVING_POD: llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk (v1:metadata.name) SERVING_POD_IP: (v1:status.podIP) SERVING_LOGGING_CONFIG: SERVING_LOGGING_LEVEL: SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"} SERVING_ENABLE_REQUEST_LOG: false SERVING_REQUEST_METRICS_BACKEND: prometheus SERVING_REQUEST_METRICS_REPORTING_PERIOD_SECONDS: 5 TRACING_CONFIG_BACKEND: none TRACING_CONFIG_ZIPKIN_ENDPOINT: TRACING_CONFIG_DEBUG: false TRACING_CONFIG_SAMPLE_RATE: 0.1 USER_PORT: 8000 SYSTEM_NAMESPACE: knative-serving METRICS_DOMAIN: knative.dev/internal/serving SERVING_READINESS_PROBE: {"tcpSocket":{"port":8000,"host":"127.0.0.1"},"successThreshold":1} ENABLE_PROFILING: false SERVING_ENABLE_PROBE_REQUEST_LOG: false METRICS_COLLECTOR_ADDRESS: HOST_IP: (v1:status.hostIP) ENABLE_HTTP2_AUTO_DETECTION: false ROOT_CA: Mounts: /mnt/configs from model-config (rw) /mnt/models from model-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f9995 (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready False ContainersReady False PodScheduled True Volumes: dshm: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: 16Gi kube-api-access-f9995: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true model-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> model-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: modelconfig-llama3-8b-instruct-1xgpu-0 Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 40s default-scheduler Successfully assigned admin/llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk to ip-172-31-7-11 Normal Pulled 40s kubelet Container image "nvcr.io/nim/meta/llama3-8b-instruct:1.0.0" already present on machine Normal Created 40s kubelet Created container kserve-container Normal Started 40s kubelet Started container kserve-container Normal Pulled 40s kubelet Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:89e6f90141f1b63405883fbb4de0d3b6d80f8b77e530904c4d29bdcd1dc5a167" already present on machine Normal Created 40s kubelet Created container queue-proxy Normal Started 39s kubelet Started container queue-proxy Warning BackOff 30s (x3 over 38s) kubelet Back-off restarting failed container agent in pod llama3-8b-instruct-1xgpu-predictor-00001-deployment-c75859sbgjk_admin(2d8c6673-baef-4eae-baac-3874d54357d2) Warning Unhealthy 29s kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 Normal Pulled 18s (x3 over 39s) kubelet Container image "charmedkubeflow/kserve-agent:0.13.0-17792da" already present on machine Normal Created 17s (x3 over 39s) kubelet Created container agent Normal Started 17s (x3 over 39s) kubelet Started container agent Warning Unhealthy 16s (x6 over 38s) kubelet Readiness probe failed: Get "http://10.1.32.188:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Notice in the agent container section the args are:
Args: --enable-puller --config-dir /mnt/configs --model-dir /mnt/models --component-port 8000
The agent container is in CrashLoopBackOff status with the error:
CrashLoopBackOff
error: unknown flag `enable-puller'
Microk8s 1.29/stable Juju 3.4/stable
No response
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6449.
This message was autogenerated
Bug Description
During working on https://github.com/canonical/bundle-kubeflow/issues/1077, I came across this issue with the kserve agent ROCK. It is a specific case where the InferenceService creates the
agent
container and tries to pass arguments to it. This is the same issue we were facing in https://github.com/canonical/katib-rocks/issues/49. The InferenceService Pod description is:Notice in the
agent
container section the args are:The
agent
container is inCrashLoopBackOff
status with the error:To Reproduce
Environment
Microk8s 1.29/stable Juju 3.4/stable
Relevant Log Output
Additional Context
No response