canonical / traefik-k8s-operator

https://charmhub.io/traefik-k8s
Apache License 2.0
11 stars 26 forks source link

Unable to use Load Balancer's IP address for the ingress gateway #361

Open Gmerold opened 4 months ago

Gmerold commented 4 months ago

Bug Description

New version of pydantic-core breaks falling back to the Load Balancer's IP for the ingress gateway when the external-hostname is not configured:

pydantic_core._pydantic_core.ValidationError: 1 validation error for IngressProviderAppData
ingress.url
  Input should be a valid URL, invalid IPv4 address [type=url_parsing, input_value='http://sdcore-nms.10.0.0.2/', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/url_parsing

Potential solution here could be using nip.io to pretend LB IP is a legit URL (e.g. 10.0.0.2.nip.io)

To Reproduce

https://canonical-charmed-aether-sd-core.readthedocs-hosted.com/en/stable/tutorials/getting_started/

Environment

Juju 3.4 Microk8s 1.27-strict/stable Traefik latest/stable

Relevant log output

pydantic_core._pydantic_core.ValidationError: 1 validation error for IngressProviderAppData
ingress.url
  Input should be a valid URL, invalid IPv4 address [type=url_parsing, input_value='http://sdcore-nms.10.0.0.2/', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/url_parsing

Additional context

No response

PietroPasotti commented 4 months ago

We think the issue is that the url being submitted to traefik is wrong because it is in fact not a valid ipv4 address: http://sdcore-nms.10.0.0.2/ pydantic deduces it's ipv4 because it ends in digits.

Is it an option to turn the address around and let it be http://10.0.0.2.sdcore-nms/ instead, which would be a valid DNS record?

Gmerold commented 4 months ago

I agree with your thinking ;) That's why I proposed using nip.io. It turns the IP into a valid URL, eliminates a need of adding entries to /etc/hosts and makes the URL feel natural (unlike http://10.0.0.2.sdcore-nms/, which kinda reverses the natural order, don't you think?).

mmkay commented 1 month ago

@Gmerold: I see that the documentation is using nip.io at the moment. Is there anything that you think we should do on the traefik side as well? Or maybe this is something we should improve in traefik's documentation?

Gmerold commented 1 month ago

Hello @mmkay, which documentation do you mean? SD-Core? We are using nip.io indeed (as an alternative to setting up the DNS server), but Traefik is still broken. I don't it's a matter of documentation, but rather handling the case when the external-hostname is not set and the charm falls back to the LB's IP.

lucabello commented 1 month ago

Currently, the ingress library is using AnyHttpUrl to validate the field; however, that fails.

We could solve this by either contributing a change upstream to pydantic (so that AnyHttpUrl accepts this type of url), or by writing a custom validator to accept it.

ca-scribner commented 1 month ago

I think what @PietroPasotti is getting at is that the linked doc uses https://sdcore-nms.10.0.0.4.nip.io, but this bug report used https://sdcore-nms.10.0.0.4 (which should not be valid because a top level domain's end cannot be purely numerical)

Doing some pure pydantic testing (not with traefik's lib, just pydantic itself), we can see:

from pydantic import BaseModel, AnyHttpUrl, ValidationError

class MyModel(BaseModel):
    url: AnyHttpUrl

# Will pass validation
MyModel(url="http://valid.com")  # a control
MyModel(url="http://valid.com1")  # Valid even though it ends with a number
MyModel(url="http://10.0.0.4.nip.io")
MyModel(url="http://sdcore-nms.10.0.0.4.nip.io")

# Will fail validation
try:
    MyModel(url="http://invalid url")  # a control
except ValidationError:
    pass
else:
    raise Exception("I should have failed")

try:
    # fails because last segment is entirely numeric
    MyModel(url="http://sdcore-nms.10.0.0.4")
except ValidationError:
    pass
else:
    raise Exception("I should have failed")

This feels consistent with other places too. For example, type https://sdcore-nms.10.0.0.4 in your chrome url bar and it'll automatically notice it is not a url and search on it instead.

So having said all that (and having not actually looked at the traefik charm), is the missing .nip.io in the url because it was missing in the input, or did traefik strip it somewhere?

Gmerold commented 3 weeks ago

Hi @sed-i, Actually it's neither :) First of all, the behavior of Chrome you are describing is new. Chrome used to accept https://sdcore-nms.10.0.0.4. But that's not the main problem. The external_hostname config of the Traefik charm is optional. If you don't specify it, LB IP will be used for building URLs of the proxied applications. In our case, we don't have an external, publicly available URL for Traefik. We're using nip.io to keep things as simple as possible. The problem is that the default "URL" produced by Traefik (client application name + Traefik's LB IP) doesn't pass the validation anymore and that fails the deployment of the bundle. On the other hand, we can't use nip.io to set the external_hostname config before Traefik is deployed, because we don't know the LB IP (it's assigned from the pool). That's why I'm proposing using nip.io at the charm level - to make sure that if the optional external_hostname is not set by the user we still end up getting a valid URL instead of charm in error state.

gruyaume commented 1 week ago

Can this issue be prioritised? Every deployment of our charmed 5G deployment is affected by it. In addition, our tutorials and documentation look bad as we're having to reference this issue and let users know that it's expected for traefik to be an error state.

Reference:

Model      Controller                  Cloud/Region                Version  SLA          Timestamp
private5g  microk8s-classic-localhost  microk8s-classic/localhost  3.4.5    unsupported  08:08:50Z

App                       Version  Status   Scale  Charm                     Channel        Rev  Address         Exposed  Message
amf                       1.4.4    active       1  sdcore-amf-k8s            1.5/edge       707  10.152.183.176  no       
ausf                      1.4.2    active       1  sdcore-ausf-k8s           1.5/edge       520  10.152.183.65   no       
grafana-agent             0.32.1   waiting      1  grafana-agent-k8s         latest/stable   45  10.152.183.221  no       installing agent
mongodb                            active       1  mongodb-k8s               6/beta          38  10.152.183.92   no       Primary
nms                       1.0.0    active       1  sdcore-nms-k8s            1.5/edge       580  10.152.183.141  no       
nrf                       1.4.1    active       1  sdcore-nrf-k8s            1.5/edge       580  10.152.183.130  no       
nssf                      1.4.1    active       1  sdcore-nssf-k8s           1.5/edge       462  10.152.183.62   no       
pcf                       1.4.3    active       1  sdcore-pcf-k8s            1.5/edge       512  10.152.183.144  no       
router                             active       1  sdcore-router-k8s         1.5/edge       341  10.152.183.218  no       
self-signed-certificates           active       1  self-signed-certificates  latest/stable  155  10.152.183.33   no       
smf                       1.5.2    active       1  sdcore-smf-k8s            1.5/edge       590  10.152.183.64   no       
traefik                   v2.11.0  waiting      1  traefik-k8s               latest/stable  194  10.152.183.198  no       installing agent
udm                       1.4.3    active       1  sdcore-udm-k8s            1.5/edge       489  10.152.183.31   no       
udr                       1.4.1    active       1  sdcore-udr-k8s            1.5/edge       486  10.152.183.82   no       
upf                       1.4.0    active       1  sdcore-upf-k8s            1.5/edge       591  10.152.183.164  no       

Unit                         Workload  Agent  Address      Ports  Message
amf/0*                       active    idle   10.1.10.181         
ausf/0*                      active    idle   10.1.10.186         
grafana-agent/0*             blocked   idle   10.1.10.133         grafana-cloud-config: off, logging-consumer: off
mongodb/0*                   active    idle   10.1.10.155         Primary
nms/0*                       active    idle   10.1.10.174         
nrf/0*                       active    idle   10.1.10.151         
nssf/0*                      active    idle   10.1.10.136         
pcf/0*                       active    idle   10.1.10.146         
router/0*                    active    idle   10.1.10.145         
self-signed-certificates/0*  active    idle   10.1.10.141         
smf/0*                       active    idle   10.1.10.154         
traefik/0*                   error     idle   10.1.10.160         hook failed: "ingress-relation-changed"
udm/0*                       active    idle   10.1.10.187         
udr/0*                       active    idle   10.1.10.176         
upf/0*                       active    idle   10.1.10.169