canonical / traefik-route-k8s-operator

https://charmhub.io/traefik-route-k8s
Apache License 2.0
0 stars 3 forks source link

[design] Avoid possible races #2

Closed PietroPasotti closed 2 years ago

PietroPasotti commented 2 years ago

In directly-related Charm <--> Traefik scenarios, what happens is:

  1. Charm initiates by sharing some ingress data (name, model, port, host...)
  2. Traefik uses that data to render a config template, with which it configures Traefik to start routing for the remote units. After that it sends back to Charm the root url at which ingress has been made available.

However let's add Traefik Route to the picture (Route henceforth) and that complicates things: The event timeline is as follows:

  1. Admin sets up Traefik, Route, and Charm and relates them.
  2. Admin configures Route.
  3. Charm shares its ingress data (name, model, port, host...)
  4. In Route: a. Given that Route is already configured, it knows what url should be sent back to Charm. b. It also knows what traefik config to send to Traefik.

PROBLEM: if at this point Route does both, there is a chance that the charm will believe it has ingress BEFORE Traefik has had the time to update its config and the service (or whatever it needs to do to actually start routing for that url).

Alternative design could be: 1-3. Stay the same

  1. Route sends config to Traefik.
  2. Traefik sends back an OK message to inform Route that that unit is being routed to.
  3. Route receives the OK and sends to Charm the url it already knew after 3.
simskij commented 2 years ago

I agree that we need to add some kind of handshake mechanism here to make sure we don't start using the URL until it's actually ready to be used. Alternative design sounds reasonable to me.

PietroPasotti commented 2 years ago

Trying to figure out what the best way to lock Route until Traefik is 'ready' is, since there is in principle no data we're sending back from Traefik to Route that Route could be checking to say 'this is up to date'.

I'm leaning towards a solution where Traefik publishes a hash of Route's app databag in its own app databag every time it updates the ingress. Then Route can, on each execution, check the hash matches (in which case it knows Traefik has already processed the data at its most recent version). If it does, publish ingress. If it does not, wipe.

Abuelodelanada commented 2 years ago

@PietroPasotti when you say:

Charm shares its ingress data (name, model, port, host...)

Do you mean that when you relate traefik (or router?) with a consumer charm, the consumer charms should sent back its data?

PietroPasotti commented 2 years ago

@Abuelodelanada That step refers to what currently is the 'request' step in ingress-per-unit + Traefik: the Charm in need of ingress sends some data that Traefik needs to render the routing rules to configure traefik with. If you relate Charm and Traefik directly (via ingress-per-unit), at that point Traefik will be able to send back the url to the Charm and Charm can rest assured that traefik is already up and running and that ingress is active.

If you relate via TraefikRoute, however, that guarantee goes out the window...

mmanciop commented 2 years ago

In directly-related Charm <--> Traefik scenarios, what happens is:

  1. Charm initiates by sharing some ingress data (name, model, port, host...)
  2. Traefik uses that data to render a config template, with which it configures Traefik to start routing for the remote units. After that it sends back to Charm the root url at which ingress has been made available.

However let's add Traefik Route to the picture (Route henceforth) and that complicates things: The event timeline is as follows:

  1. Admin sets up Traefik, Route, and Charm and relates them.
  2. Admin configures Route.
  3. Charm shares its ingress data (name, model, port, host...)
  4. In Route: a. Given that Route is already configured, it knows what url should be sent back to Charm. b. It also knows what traefik config to send to Traefik.

PROBLEM: if at this point Route does both, there is a chance that the charm will believe it has ingress BEFORE Traefik has had the time to update its config and the service (or whatever it needs to do to actually start routing for that url).

Alternative design could be: 1-3. Stay the same

  1. Route sends config to Traefik.
  2. Traefik sends back an OK message to inform Route that that unit is being routed to.
  3. Route receives the OK and sends to Charm the url it already knew after 3.

I do not really see the issue with Route issuing a second URL downstream, as well as Traefik updating its routes, in a relatively short amount of time.

PietroPasotti commented 2 years ago

Closing this after a chat with @mmanciop, seems premature to worry about a race here until we know there's a problem. The mantra is 'eventually consistent': if the charm reads the url before traefik is routing for it, there might be some errors downstream, but as soon as Traefik processes the incoming event, all will be fine again and the errors will vanish.