canonical / redis-k8s-operator

Operator Charm for Redis
Apache License 2.0
5 stars 10 forks source link

Relation uses bind_address, which can change #8

Open tomwardill opened 3 years ago

tomwardill commented 3 years ago

https://github.com/canonical/redis-operator/blob/master/src/charm.py#L165

This address can change if the pod is killed and respawned. I think a better solution would be to use either the Service IP, or the app name. The app name is valid DNS in k8s that have DNS running and the lookup moves with the pod.

There may be a more preferred juju method of doing this that I'm not aware of.

https://bugs.launchpad.net/juju/+bug/1911135 looks to be related.

mthaddon commented 3 years ago

We should avoid using the k8s DNS because that won't be available outside of k8s in the case where we're using cross-model relations, but we should definitely investigate/fix this.

edumucelli commented 3 years ago

Hey Tom and Tom :smiley:. Interesting, when writing this I was actually not sure what do to, that is why the code is a bit convoluted in there, falling back self.app.name, because I knew it was valid if bind_address was not available.

@mthaddon do you think just using app.name would be enough here or it is more complicated than that?

mthaddon commented 1 year ago

I think the charm should be looking at whether its IP has changed as part of hooks that might fire whenever this has happened. If it has changed, it should publish that change on the relation so relating charms can take appropriate action. We may need some help from the juju team to identify which hook(s) would be relevant here, but we've observed the following hooks running on a pod after a restart event that caused an IP change:

$ kubectl logs -n prod-events-k8s redis-broker-0 -c charm | grep ' ran ' | head -20
2023-06-02T04:09:46.486Z [container-agent] 2023-06-02 04:09:46 INFO juju.worker.uniter.operation runhook.go:159 ran "start" hook (via hook dispatching script: dispatch)
2023-06-02T04:09:59.610Z [container-agent] 2023-06-02 04:09:59 INFO juju.worker.uniter.operation runhook.go:159 ran "start" hook (via hook dispatching script: dispatch)
2023-06-02T04:10:15.313Z [container-agent] 2023-06-02 04:10:15 INFO juju.worker.uniter.operation runhook.go:159 ran "sentinel-pebble-ready" hook (via hook dispatching script: dispatch)
2023-06-02T04:10:50.723Z [container-agent] 2023-06-02 04:10:50 INFO juju.worker.uniter.operation runhook.go:159 ran "redis-pebble-ready" hook (via hook dispatching script: dispatch)
2023-06-02T04:11:22.162Z [container-agent] 2023-06-02 04:11:22 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:11:42.011Z [container-agent] 2023-06-02 04:11:41 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:12:37.407Z [container-agent] 2023-06-02 04:12:37 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:13:37.572Z [container-agent] 2023-06-02 04:13:37 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:14:31.409Z [container-agent] 2023-06-02 04:14:31 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:15:39.592Z [container-agent] 2023-06-02 04:15:39 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:15:55.376Z [container-agent] 2023-06-02 04:15:55 INFO juju.worker.uniter.operation runhook.go:159 ran "sentinel-pebble-ready" hook (via hook dispatching script: dispatch)
2023-06-02T04:16:53.755Z [container-agent] 2023-06-02 04:16:53 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:17:57.675Z [container-agent] 2023-06-02 04:17:57 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:18:58.588Z [container-agent] 2023-06-02 04:18:58 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:19:48.832Z [container-agent] 2023-06-02 04:19:48 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:20:50.449Z [container-agent] 2023-06-02 04:20:50 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:21:23.950Z [container-agent] 2023-06-02 04:21:23 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:22:23.503Z [container-agent] 2023-06-02 04:22:23 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:23:15.746Z [container-agent] 2023-06-02 04:23:15 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2023-06-02T04:24:18.254Z [container-agent] 2023-06-02 04:24:18 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)

Sounds like we'd either need to update the IP on the relation as part of the start hook or the redis-pebble-ready hook.