Closed Gmerold closed 1 week ago
We think the issue is that the url being submitted to traefik is wrong because it is in fact not a valid ipv4 address: http://sdcore-nms.10.0.0.2/
pydantic deduces it's ipv4 because it ends in digits.
Is it an option to turn the address around and let it be http://10.0.0.2.sdcore-nms/
instead, which would be a valid DNS record?
I agree with your thinking ;)
That's why I proposed using nip.io
. It turns the IP into a valid URL, eliminates a need of adding entries to /etc/hosts
and makes the URL feel natural (unlike http://10.0.0.2.sdcore-nms/
, which kinda reverses the natural order, don't you think?).
@Gmerold: I see that the documentation is using nip.io
at the moment. Is there anything that you think we should do on the traefik side as well? Or maybe this is something we should improve in traefik's documentation?
Hello @mmkay, which documentation do you mean? SD-Core?
We are using nip.io indeed (as an alternative to setting up the DNS server), but Traefik is still broken. I don't it's a matter of documentation, but rather handling the case when the external-hostname
is not set and the charm falls back to the LB's IP.
Currently, the ingress library is using AnyHttpUrl
to validate the field; however, that fails.
We could solve this by either contributing a change upstream to pydantic (so that AnyHttpUrl
accepts this type of url), or by writing a custom validator to accept it.
I think what @PietroPasotti is getting at is that the linked doc uses https://sdcore-nms.10.0.0.4.nip.io
, but this bug report used https://sdcore-nms.10.0.0.4
(which should not be valid because a top level domain's end cannot be purely numerical)
Doing some pure pydantic testing (not with traefik's lib, just pydantic itself), we can see:
from pydantic import BaseModel, AnyHttpUrl, ValidationError
class MyModel(BaseModel):
url: AnyHttpUrl
# Will pass validation
MyModel(url="http://valid.com") # a control
MyModel(url="http://valid.com1") # Valid even though it ends with a number
MyModel(url="http://10.0.0.4.nip.io")
MyModel(url="http://sdcore-nms.10.0.0.4.nip.io")
# Will fail validation
try:
MyModel(url="http://invalid url") # a control
except ValidationError:
pass
else:
raise Exception("I should have failed")
try:
# fails because last segment is entirely numeric
MyModel(url="http://sdcore-nms.10.0.0.4")
except ValidationError:
pass
else:
raise Exception("I should have failed")
This feels consistent with other places too. For example, type https://sdcore-nms.10.0.0.4
in your chrome url bar and it'll automatically notice it is not a url and search on it instead.
So having said all that (and having not actually looked at the traefik charm), is the missing .nip.io
in the url because it was missing in the input, or did traefik strip it somewhere?
Hi @sed-i,
Actually it's neither :)
First of all, the behavior of Chrome you are describing is new. Chrome used to accept https://sdcore-nms.10.0.0.4
. But that's not the main problem.
The external_hostname
config of the Traefik charm is optional. If you don't specify it, LB IP will be used for building URLs of the proxied applications. In our case, we don't have an external, publicly available URL for Traefik. We're using nip.io
to keep things as simple as possible. The problem is that the default "URL" produced by Traefik (client application name + Traefik's LB IP) doesn't pass the validation anymore and that fails the deployment of the bundle. On the other hand, we can't use nip.io
to set the external_hostname
config before Traefik is deployed, because we don't know the LB IP (it's assigned from the pool).
That's why I'm proposing using nip.io
at the charm level - to make sure that if the optional external_hostname
is not set by the user we still end up getting a valid URL instead of charm in error state.
Can this issue be prioritised? Every deployment of our charmed 5G deployment is affected by it. In addition, our tutorials and documentation look bad as we're having to reference this issue and let users know that it's expected for traefik to be an error state.
Reference:
Model Controller Cloud/Region Version SLA Timestamp
private5g microk8s-classic-localhost microk8s-classic/localhost 3.4.5 unsupported 08:08:50Z
App Version Status Scale Charm Channel Rev Address Exposed Message
amf 1.4.4 active 1 sdcore-amf-k8s 1.5/edge 707 10.152.183.176 no
ausf 1.4.2 active 1 sdcore-ausf-k8s 1.5/edge 520 10.152.183.65 no
grafana-agent 0.32.1 waiting 1 grafana-agent-k8s latest/stable 45 10.152.183.221 no installing agent
mongodb active 1 mongodb-k8s 6/beta 38 10.152.183.92 no Primary
nms 1.0.0 active 1 sdcore-nms-k8s 1.5/edge 580 10.152.183.141 no
nrf 1.4.1 active 1 sdcore-nrf-k8s 1.5/edge 580 10.152.183.130 no
nssf 1.4.1 active 1 sdcore-nssf-k8s 1.5/edge 462 10.152.183.62 no
pcf 1.4.3 active 1 sdcore-pcf-k8s 1.5/edge 512 10.152.183.144 no
router active 1 sdcore-router-k8s 1.5/edge 341 10.152.183.218 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 10.152.183.33 no
smf 1.5.2 active 1 sdcore-smf-k8s 1.5/edge 590 10.152.183.64 no
traefik v2.11.0 waiting 1 traefik-k8s latest/stable 194 10.152.183.198 no installing agent
udm 1.4.3 active 1 sdcore-udm-k8s 1.5/edge 489 10.152.183.31 no
udr 1.4.1 active 1 sdcore-udr-k8s 1.5/edge 486 10.152.183.82 no
upf 1.4.0 active 1 sdcore-upf-k8s 1.5/edge 591 10.152.183.164 no
Unit Workload Agent Address Ports Message
amf/0* active idle 10.1.10.181
ausf/0* active idle 10.1.10.186
grafana-agent/0* blocked idle 10.1.10.133 grafana-cloud-config: off, logging-consumer: off
mongodb/0* active idle 10.1.10.155 Primary
nms/0* active idle 10.1.10.174
nrf/0* active idle 10.1.10.151
nssf/0* active idle 10.1.10.136
pcf/0* active idle 10.1.10.146
router/0* active idle 10.1.10.145
self-signed-certificates/0* active idle 10.1.10.141
smf/0* active idle 10.1.10.154
traefik/0* error idle 10.1.10.160 hook failed: "ingress-relation-changed"
udm/0* active idle 10.1.10.187
udr/0* active idle 10.1.10.176
upf/0* active idle 10.1.10.169
@dstathis can you please make sure this is included in the pulse that starts on Monday? Thanks.
Yup no problem
I think the issue here is just misconfiguration. Traefik has two routing_mode
s:
path
: (default) provides routes as paths, eg: http://1.2.3.4/mymodel-myapp
subdomain
: provides routes as subdomains, eg: http://mymodel.myapp.1.2.3.4
(or maybe mymodel-myapp
, can't remember)If you're using the loadbalancer IP as the domain, then subdomain really isn't valid (since mymodel.myapp.1.2.3.4) isn't a valid domain based on the above conversation. Feels like path
is the only valid config here.
Is there a reason why path
wouldn't work here? that seems like the easy fix that can be implemented user-side and no risk of side effects if we add .nip.io
This kinda reminds me a story of my buddy. He used to have a car with a broken gearbox; only second and fourth gear would work. One day I had to drive this car and obviously I wanted to start with a first gear. After I struggled for a short while, my buddy told me to use the second gear instead. After starting on a second gear, I had to push the RPMs really high to be able to change to fourth gear directly, because the third wouldn't work as well. When I asked him about fixing the gearbox, he was like "nah, two of them still work".
Traefik has two routing modes and it should be user's decision which one he wants to use. If the correct charm configuration produces incorrect output, it is a problem in the charm. If you're afraid of side effects of using .nip.io
, the alternative approach could be making the charm require external_hostname
when subdomain
is used.
Yes agreed, the root issue here is that if subdomain
is used, then we need to require an external_hostname
to be configured. I'm working to implement that constraint now. In future, expect that this charm will (more gracefully) block someone from using IP+subdomain
routing_mode=subdomain
and an unset external_hostname
. That's added to the config descriptions, and there's some warning messages that'll appear if this comes up.
Bug Description
New version of
pydantic-core
breaks falling back to the Load Balancer's IP for the ingress gateway when theexternal-hostname
is not configured:Potential solution here could be using nip.io to pretend LB IP is a legit URL (e.g.
10.0.0.2.nip.io
)To Reproduce
https://canonical-charmed-aether-sd-core.readthedocs-hosted.com/en/stable/tutorials/getting_started/
Environment
Juju 3.4 Microk8s 1.27-strict/stable Traefik latest/stable
Relevant log output
Additional context
No response