Closed natalytvinova closed 4 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5337.
This message was autogenerated
Hi @natalytvinova, thanks for filing this issue.
I have identified the following:
istio-ingressgateway ClusterIP 10.152.183.184 <none> 65535/TCP 10d
istio-ingressgateway-endpoints ClusterIP None <none> <none> 10d
istio-ingressgateway-workload LoadBalancer 10.152.183.89 <IP> 80:30481/TCP,443:30282/TCP 10d
istio-pilot ClusterIP 10.152.183.57 <none> 65535/TCP 10d
This can be caused by the k8s node not having a Loadbalancer by default. Could you please share what are the networking details of your node? Also, if you are using something that is not a LB, I recommend you change it using the gateway_service_type config option.
acme
is internally complaining when trying to get certificates with the message ...
acme: Obtaining bundled SAN certificate given a CSR
Could not obtain certificates:
...NewOrder request included invalid non-DNS type identifier: type "ip", value "<IP>"
This is happening because, whenever the istio-pilot
generates the CSR, we share the ingress gateway service IP. Because of (1), we are sharing an incorrect value <IP>
. We have to make sure that the ingress gateway service is correctly configured and it has an IP address.
acme
will still complain:unit-httprequest-lego-k8s-0: 11:13:16 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:5: Exited with code 1. Stderr:
unit-httprequest-lego-k8s-0: 11:13:16 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:5: 2024/02/14 10:13:15 [INFO] [10.64.140.43, istio-pilot-0.istio-pilot-endpoints.test-tls.svc.cluster.local] acme: Obtaining bundled SAN certificate given a CSR
unit-httprequest-lego-k8s-0: 11:13:16 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:5: 2024/02/14 10:13:16 Could not obtain certificates:
unit-httprequest-lego-k8s-0: 11:13:16 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:5: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:unsupportedIdentifier :: NewOrder request included invalid non-DNS type identifier: type "ip", value "10.64.140.43"
For debugging this further, I need to connect with the maintainers of this charm to understand what's the issue with this, as it not only happens with istio, but also with traefik.
unit-httprequest-lego-k8s-0: 11:12:29 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:4: Exited with code 1. Stderr:
unit-httprequest-lego-k8s-0: 11:12:29 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:4: 2024/02/14 10:12:29 [INFO] [10.64.140.44] acme: Obtaining bundled SAN certificate given a CSR
unit-httprequest-lego-k8s-0: 11:12:29 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:4: 2024/02/14 10:12:29 Could not obtain certificates:
unit-httprequest-lego-k8s-0: 11:12:29 ERROR unit.httprequest-lego-k8s/0.juju-log certificates:4: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:unsupportedIdentifier :: NewOrder request included invalid non-DNS type identifier: type "ip", value "10.64.140.44"
@DnPlas Hi! Thank you for investigating this. So we're using a LB. This is my config: gateway_service_type: default: LoadBalancer description: | Type of service for the ingress gateway out of: 'ClusterIP', 'LoadBalancer', or 'NodePort'. source: default type: string value: LoadBalancer
Please correct me if I'm wrong, you're saying that istio-ingressgateway service should get an ip, not the istio-ingressgateway-workload service? Because the second one does have an IP, I just redacted it.
Please correct me if I'm wrong, you're saying that istio-ingressgateway service should get an ip, not the istio-ingressgateway-workload service? Because the second one does have an IP, I just redacted it.
Yes, the ingressgateway-workload
is the svc that should have the IP, so:
istio-ingressgateway-workload LoadBalancer 10.152.183.89 <IP> 80:30481/TCP,443:30282/TCP 10d
Should have some IP instead of just <IP>
and the service type should match what you have in your host.
So we're using a LB. This is my config: gateway_service_type: default: LoadBalancer description: | Type of service for the ingress gateway out of: 'ClusterIP', 'LoadBalancer', or 'NodePort'. source: default type: string value: LoadBalancer
Right, so that is the configuration of the charm. In your node, are you positive you have configured a loadbalancer for your ingress (the one that sits at the edge of your k8s node/cluster)? I understand your deployment is in Charmed Kubernetes. Historically people have just used NodePort
instead of Loadbalancer
and configure the istio-ingressgateway
charm accordingly.
@DnPlas oh it does have the ip, sorry for confusion. We have
istio-ingressgateway-workload LoadBalancer 10.152.183.89 <IP redacted> 80:30481/TCP,443:30282/TCP 10d
That LoadBalancer does exist in Openstack, if this is what you're asking for
@DnPlas also, as I understood from how Let's encrypt works. It can't issue the certificate from an IP. So CSR needs to contain the url in order for it to function properly https://community.letsencrypt.org/t/neworder-request-included-invalid-non-dns-type-identifier-type-ip/170623. And the istio-pilot charm generates the CSR, so we need the CSR to be generated with the url instead of IP
@DnPlas also, as I understood from how Let's encrypt works. It can't issue the certificate from an IP. So CSR needs to contain the url in order for it to function properly https://community.letsencrypt.org/t/neworder-request-included-invalid-non-dns-type-identifier-type-ip/170623. And the istio-pilot charm generates the CSR, so we need the CSR to be generated with the url instead of IP
Correct, but what URL should that be? Is it the name of the service, or is it something else?
@DnPlas my understanding was it should be the name of the service, but from istio-pilot point it doesn't see any service name. And that is true, because istio-ingressgateway does not expose it, but maybe it should? This it would be a good idea to confirm with both Telco and instio-ingressgateway team
After talking to the maintainers of the tls-interface
library and of the certificate provider charms, we can confirm that:
all
lego
charms will only for with CSRs for domain names, and not for IPs.
This confirms what we stated in a previous comment and here.
The problem with istio-pilot
at the moment is that it only shares the ingress gateway Service IP to generate the CSR, which will work for most of the certificate providers, but not for lego
, they will just simply reject the request.
In order to fix this issue and be able to actually connect istio-pilot
with lego
charms, we need to start sharing the domain name instead of the IP. Before committing any changes, let's consider the following:
lego
charms?@natalytvinova I will allocate a bit more time to work on this and see how we can extend support for this integration.
Update on the things we have to do in order to support the lego
integration better.
The following diagram presents the proposed architecture for the integration with different certificate providers. We expect the istio-pilot
to keep using the tls-certificates-interface
, but instead of just using the IP address of the istio-ingresgateway-workload Service
to generate the CSR, it will now generate a cert_subject
, which will be "calculated" as follows:
The istio-pilot
charm will provide enough information to users in case of an invalid or missing domain-name
.
The following diagram presents a high level overview of the proposed model:
istio-pilot
charm with the new configuration option, and to "calculate" the cert_subject
based on the conditions above.EDIT: we are going to work on this improvement for 24.04, but the priority now will be https://github.com/canonical/istio-operators/issues/380. After discussing with @natalytvinova, we agreed that for now her deployment doesn't require an integration with a TLS certificates provider as they already have a ssl key and cert, which can be passed to istio-pilot
for configuring the Gateway accordingly.
main
-> Wednesday 21stIf we are able to land these changes in main
before Wednesday, that'll move the release date for 1.17 one or two days earlier.
EDIT: after discussing with the telco team about best approaches to integrate with the TLS certificate providers, we have concluded that it would be better to enable istio-pilot
to get a domain_name
(see image above) so it can correctly send a CSR to the tls-cert-provider
charm, which in turn will handle all the logic to get a signed cert from a CA. I will explain this in more detail in a later comment.
Bug Description
I deployed Kubeflow 1.8 and tryed to integrate Istio-pilot charm with httpreq acme operator by following their how-to-guide. Unfortunately, after adding the relation, the certificate is not being created, because Let's encrypt doesn't allow provisioning the certificates for IP addresses. And the istio-pilot charm is supplying the IP address with the CSR to the certificate charm. This can be seen in the juju debug-log bellow.
It makes sense to me that istio-pilot charm is using the IP because it is not aware of the url that needs to be used in our case. Thi url is supplied only as a parameter in oidc-gatekeeper and dex-auth charms in my bundle. The same way, in kubernetes, this service is somehow no aware of this url.
istio-ingressgateway ClusterIP 10.152.183.184 <none> 65535/TCP 10d istio-ingressgateway-endpoints ClusterIP None <none> <none> 10d istio-ingressgateway-workload LoadBalancer 10.152.183.89 <IP> 80:30481/TCP,443:30282/TCP 10d istio-pilot ClusterIP 10.152.183.57 <none> 65535/TCP 10d
I'm not sure if this is a bug or is there a way to configure istio-pilot to use the url we need
To Reproduce
Environment
Kubeflow 1.8/stable https://github.com/canonical/bundle-kubeflow/tree/main/releases/1.8/stable/kubeflow Charmed Kubernetes 1.28 on Charmed Openstack Yoga juju 3.1
Relevant Log Output
Additional Context
No response