Closed DnPlas closed 1 day ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5907.
This message was autogenerated
In a model with envoy
2.0/stable this issue is not present:
Model Controller Cloud/Region Version SLA Timestamp
kubeflow uk8s-343 microk8s/localhost 3.4.3 unsupported 21:51:30Z
App Version Status Scale Charm Channel Rev Address Exposed Message
envoy res:oci-image@cc06b3e active 1 envoy 2.0/stable 194 10.152.183.154 no
istio-ingressgateway active 1 istio-gateway 1.17/stable 1000 10.152.183.112 no
istio-pilot active 1 istio-pilot 1.17/stable 1011 10.152.183.166 no
mlmd res:oci-image@44abc5d active 1 mlmd 1.14/stable 127 10.152.183.167 no
Unit Workload Agent Address Ports Message
envoy/1* active idle 10.1.60.145 9090,9901/TCP
istio-ingressgateway/0* active idle 10.1.60.158
istio-pilot/0* active idle 10.1.60.156
mlmd/1* active idle 10.1.60.157 8080/TCP
Integration provider Requirer Interface Type Message
istio-pilot:ingress envoy:ingress ingress regular
istio-pilot:istio-pilot istio-ingressgateway:istio-pilot k8s-service regular
istio-pilot:peers istio-pilot:peers istio_pilot_peers peer
mlmd:grpc envoy:grpc grpc regular
I noticed that in this version of the charm, we block the unit if the relation with istio-pilot
is missing, so I had to deploy istio-operators
in order to make the envoy
unit go to active, but after that the reported issue is not present.
That's weird because when when the envoy.yaml
was updated, it had been tested by myself and the PR's reviewer https://github.com/canonical/envoy-operator/pull/102#pullrequestreview-2107671499.
Ok so something's wrong with the charm's image, I tried the following and that made the charm go active
jref envoy --resource oci-image=gcr.io/ml-pipeline/metadata-envoy:2.2.0
which is the charm's default image.
Confirmed by deploying the envoy charm with that image and it went to active
jd envoy --channel latest/edge --trust --resource oci-image=gcr.io/ml-pipeline/metadata-envoy:2.2.0
So it looks like charm's publishing has been messed up.
track/2.0
You can see that the charm was published using the oci-image 104 https://github.com/canonical/envoy-operator/actions/runs/9662384372/job/26652904155#step:5:180
main
You can see that the charm was published again using the oci-image 104 https://github.com/canonical/envoy-operator/actions/runs/9701404321/job/26782877651#step:5:184
latest/edge
using a new image. That created a new resource (oci-image:102
) and charm was published using that new resource.publish
jobs from track/2.0
use as resource oci-image:104
latest/edge
(with no change in the image). The publish job used also as resource the latest available meaning oci-image:104
.This results in both charms being published using the same image although their metadata.yaml
files define a different one.
The charm has been published with the following resources:
Not sure also what 103 is, since the charm image in main
didnt' change after 10th June
After transfering this charm to kubeflow-charmers, we re-released envoy with the resource it had been released with when we updated the manifests executing:
╰─$ charmcraft release envoy --revision=231 --channel latest/edge --resource=oci-image:102
We 'd be looking in the root cause of this as part of https://github.com/canonical/bundle-kubeflow/issues/962.
Bug Description
It looks like the configuration in
/envoy/envoy.yaml'
is avoiding the service to start correctly, leaving the unit inWaitingStatus
w/o a clear resolution path.From the logs I can see
Unable to parse JSON as proto (INVALID_ARGUMENT:(static_resources.listeners[0].filter_chains[0].filters[0].typed_config): invalid value Invalid type URL, unknown type: envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager for type Any
, which suggests that this value is not recognized.To Reproduce
juju envoy --channel latest/edge --trust
juju mlmd --channel latest/edge --trust
juju relate envoy mlmd
Environment
Relevant Log Output
Additional Context
Strangely enough, this is not being captured by
envoy
's CI - I have ran two attempts onHEAD
and they both succeed. This behaviour was caught by thekfp-operators
CI here. I was also able to reproduce it locally.