Closed NohaIhab closed 3 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5753.
This message was autogenerated
Looking a bit around in upstream manifests, it looks like we need to configure the queue-proxy image in this place of the manifests https://github.com/kubeflow/manifests/blob/v1.8.0/common/knative/knative-serving/base/upstream/serving-core.yaml#L4667
We'll need to ensure we can configure this also via the KnativeServing CR
Lastly, in those manifests I also see an Image
CustomResource where they also define an image, which we should most probably need to patch
Thanks @kimwnasptd for the pointers.
Inspecting my airgapped cluster, indeed I see the queue-sidecar-image
set to the upstream image in config-deployment
ConfigMap in the knative-serving
namespace:
kubectl get ConfigMap -n knative-serving config-deployment -oyaml
apiVersion: v1
data:
_example: |-
################################
# #
# EXAMPLE CONFIGURATION #
# #
################################
# This block is not actually functional configuration,
# but serves to illustrate the available configuration
# options and document them in a way that is accessible
# to users that `kubectl edit` this config map.
#
# These sample configuration options may be copied out of
# this example block and unindented to be in the data block
# to actually change the configuration.
# List of repositories for which tag to digest resolving should be skipped
registries-skipping-tag-resolving: "kind.local,ko.local,dev.local"
# Maximum time allowed for an image's digests to be resolved.
digest-resolution-timeout: "10s"
# Duration we wait for the deployment to be ready before considering it failed.
progress-deadline: "600s"
# Sets the queue proxy's CPU request.
# If omitted, a default value (currently "25m"), is used.
queue-sidecar-cpu-request: "25m"
# Sets the queue proxy's CPU limit.
# If omitted, no value is specified and the system default is used.
queue-sidecar-cpu-limit: "1000m"
# Sets the queue proxy's memory request.
# If omitted, no value is specified and the system default is used.
queue-sidecar-memory-request: "400Mi"
# Sets the queue proxy's memory limit.
# If omitted, no value is specified and the system default is used.
queue-sidecar-memory-limit: "800Mi"
# Sets the queue proxy's ephemeral storage request.
# If omitted, no value is specified and the system default is used.
queue-sidecar-ephemeral-storage-request: "512Mi"
# Sets the queue proxy's ephemeral storage limit.
# If omitted, no value is specified and the system default is used.
queue-sidecar-ephemeral-storage-limit: "1024Mi"
# Sets tokens associated with specific audiences for queue proxy - used by QPOptions
#
# For example, to add the `service-x` audience:
# queue-sidecar-token-audiences: "service-x"
# Also supports a list of audiences, for example:
# queue-sidecar-token-audiences: "service-x,service-y"
# If omitted, or empty, no tokens are created
queue-sidecar-token-audiences: ""
# Sets rootCA for the queue proxy - used by QPOptions
# If omitted, or empty, no rootCA is added to the golang rootCAs
queue-sidecar-rootca: ""
queue-sidecar-image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:dabaecec38860ca4c972e6821d5dc825549faf50c6feb8feb4c04802f2338b8a
kind: ConfigMap
metadata:
annotations:
knative.dev/example-checksum: 410041a0
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"_example":"################################\n# #\n# EXAMPLE CONFIGURATION #\n# #\n################################\n\n# This block is not actually functional configuration,\n# but serves to illustrate the available configuration\n# options and document them in a way that is accessible\n# to users that `kubectl edit` this config map.\n#\n# These sample configuration options may be copied out of\n# this example block and unindented to be in the data block\n# to actually change the configuration.\n\n# List of repositories for which tag to digest resolving should be skipped\nregistries-skipping-tag-resolving: \"kind.local,ko.local,dev.local\"\n\n# Maximum time allowed for an image's digests to be resolved.\ndigest-resolution-timeout: \"10s\"\n\n# Duration we wait for the deployment to be ready before considering it failed.\nprogress-deadline: \"600s\"\n\n# Sets the queue proxy's CPU request.\n# If omitted, a default value (currently \"25m\"), is used.\nqueue-sidecar-cpu-request: \"25m\"\n\n# Sets the queue proxy's CPU limit.\n# If omitted, no value is specified and the system default is used.\nqueue-sidecar-cpu-limit: \"1000m\"\n\n# Sets the queue proxy's memory request.\n# If omitted, no value is specified and the system default is used.\nqueue-sidecar-memory-request: \"400Mi\"\n\n# Sets the queue proxy's memory limit.\n# If omitted, no value is specified and the system default is used.\nqueue-sidecar-memory-limit: \"800Mi\"\n\n# Sets the queue proxy's ephemeral storage request.\n# If omitted, no value is specified and the system default is used.\nqueue-sidecar-ephemeral-storage-request: \"512Mi\"\n\n# Sets the queue proxy's ephemeral storage limit.\n# If omitted, no value is specified and the system default is used.\nqueue-sidecar-ephemeral-storage-limit: \"1024Mi\"\n\n# Sets tokens associated with specific audiences for queue proxy - used by QPOptions\n#\n# For example, to add the `service-x` audience:\n# queue-sidecar-token-audiences: \"service-x\"\n# Also supports a list of audiences, for example:\n# queue-sidecar-token-audiences: \"service-x,service-y\"\n# If omitted, or empty, no tokens are created\nqueue-sidecar-token-audiences: \"\"\n\n# Sets rootCA for the queue proxy - used by QPOptions\n# If omitted, or empty, no rootCA is added to the golang rootCAs\nqueue-sidecar-rootca: \"\"","queue-sidecar-image":"gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:dabaecec38860ca4c972e6821d5dc825549faf50c6feb8feb4c04802f2338b8a"},"kind":"ConfigMap","metadata":{"annotations":{"knative.dev/example-checksum":"410041a0"},"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/name":"knative-serving","app.kubernetes.io/version":"1.10.2"},"name":"config-deployment","namespace":"knative-serving","ownerReferences":[{"apiVersion":"operator.knative.dev/v1beta1","blockOwnerDeletion":true,"controller":true,"kind":"KnativeServing","name":"knative-serving","uid":"8994c6ac-bdb5-476e-a791-cc493d0481e0"}]}}
manifestival: new
creationTimestamp: "2024-06-05T11:36:05Z"
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: knative-serving
app.kubernetes.io/version: 1.10.2
name: config-deployment
namespace: knative-serving
ownerReferences:
- apiVersion: operator.knative.dev/v1beta1
blockOwnerDeletion: true
controller: true
kind: KnativeServing
name: knative-serving
uid: 8994c6ac-bdb5-476e-a791-cc493d0481e0
resourceVersion: "17546"
uid: 05551649-ea37-4961-b776-ed49069a7f1e
I was not able to reproduce this issue in airgapped today. I am seeing this error when trying to apply the knative service:
Error from server (InternalError): error when creating "ksvc.yaml": Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/?timeout=10s": dial tcp 10.152.183.91:443: connect: connection refused
I looked into it and filed #185 , after setting the images specified in #185 I no longer see the error above so I can now start playing with editing the configmap and the Image
resource
I modified the KnativeServing.yaml.j2
file locally by adding to the spec.config.deployment
:
queue-sidecar-image: 172.17.0.2:5000/knative-releases/knative.dev/serving/cmd/queue:dabaecec38860ca4c972e6821d5dc825549faf50c6feb8feb4c04802f2338b8a
to test out if this is what we need for the KnativeService workload. Now I see the queue-proxy
image set correctly in the pod:
so this fixes the ImagePullBackOff
error we were seeing earlier.
I also saw the Image
resource being set correctly, so we don't need to patch it:
kubectl get Images -nknative-serving
NAME IMAGE
queue-proxy 172.17.0.2:5000/knative-releases/knative.dev/serving/cmd/queue:dabaecec38860ca4c972e6821d5dc825549faf50c6feb8feb4c04802f2338b8a
I will be sending a PR to add the queue-sidecar-image
to the KnativeServing.yaml.j2
manifest template
After configuring the queue-sidecar-image
correctly, the KnativeService
is still not Ready
as desired.
The workload pod is stuck with 2/3
Ready containers
The queue-proxy
container is not Ready, but not due to ImagePullBackOff
. In the pod description it says:
Warning Unhealthy 2m55s (x7 over 4m8s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 2m54s (x5 over 4m9s) kubelet Readiness probe failed: Get "http://10.1.205.185:15020/app-health/queue-proxy/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
The logs of the queue-proxy
container are:
kubectl logs -nadmin helloworld-00001-deployment-dbc8767d4-dx72q -c queue-proxy
{"severity":"INFO","timestamp":"2024-06-06T09:34:52.152001373Z","logger":"queueproxy","caller":"sharedmain/main.go:259","message":"Starting queue-proxy","commit":"500756c","knative.dev/key":"admin/helloworld-00001","knative.dev/pod":"helloworld-00001-deployment-dbc8767d4-dx72q"}
{"severity":"INFO","timestamp":"2024-06-06T09:34:52.152352674Z","logger":"queueproxy","caller":"sharedmain/main.go:265","message":"Starting http server metrics:9090","commit":"500756c","knative.dev/key":"admin/helloworld-00001","knative.dev/pod":"helloworld-00001-deployment-dbc8767d4-dx72q"}
{"severity":"INFO","timestamp":"2024-06-06T09:34:52.152348217Z","logger":"queueproxy","caller":"sharedmain/main.go:265","message":"Starting http server main:8012","commit":"500756c","knative.dev/key":"admin/helloworld-00001","knative.dev/pod":"helloworld-00001-deployment-dbc8767d4-dx72q"}
{"severity":"INFO","timestamp":"2024-06-06T09:34:52.15236817Z","logger":"queueproxy","caller":"sharedmain/main.go:265","message":"Starting http server admin:8022","commit":"500756c","knative.dev/key":"admin/helloworld-00001","knative.dev/pod":"helloworld-00001-deployment-dbc8767d4-dx72q"}
aggressive probe error (failed 202 times): dial tcp 127.0.0.1:8080: connect: connection refused
timed out waiting for the condition
aggressive probe error (failed 202 times): dial tcp 127.0.0.1:8080: connect: connection refused
timed out waiting for the condition
aggressive probe error (failed 202 times): dial tcp 127.0.0.1:8080: connect: connection refused
timed out waiting for the condition
aggressive probe error (failed 202 times): dial tcp 127.0.0.1:8080: connect: connection refused
timed out waiting for the condition
aggressive probe error (failed 202 times): dial tcp 127.0.0.1:8080: connect: connection refused
timed out waiting for the condition
I'm not sure now why this is happening so I'm looking into it
I tried running the dummy ksvc that's in the description in a non-airgapped environment and I was seeing the same error. Therefore, this is irrelevant to this issue and I will explore the ksvc example in https://github.com/canonical/bundle-kubeflow/issues/917
PR is now open #186 to add the config for queue image
closed by #186 and port-forwarded to main in #189
Bug Description
Hit this issue while testing CKF 1.8 in airgapped (related to https://github.com/canonical/bundle-kubeflow/issues/889 and https://github.com/canonical/bundle-kubeflow/issues/898): When configuring the
queue-proxy
image in the custom images, then creating aKnativeService
, thequeue-proxy
image in theKnativeService
gets the default value, not the one configured in the charm. This is a blocker for using KNative in an airgapped environment. See https://github.com/canonical/knative-operators/issues/140 for context on configuring the custom images. Looking at theknative-serving
charm's config, we can see that the custom image is set there:and Looking at the KnativeServing CR, we can see that the
queue-proxy
field is set there correctly in theregistry
section:However, it is not getting picked up by the
KnativeService
CR's Pod:It still sets the default image in the container^
To Reproduce
queue-proxy
image to the image in the local registry, in my case I set it to172.17.0.2:5000/knative-releases/knative.dev/serving/cmd/queue:dabaecec38860ca4c972e6821d5dc825549faf50c6feb8feb4c04802f2338b8a
queue-proxy
image in the KnativeService pod:pod description
Environment
airgapped environment microk8s 1.25-strict/stable juju 3.1/stable
Relevant Log Output
Additional Context
No response