canonical / discourse-k8s-operator

discourse-k8s-operator - charm repository.
Apache License 2.0
7 stars 5 forks source link

Redis relation does not read all sentinel addresses #268

Closed reneradoi closed 3 weeks ago

reneradoi commented 4 months ago

Bug Description

Hi Team!

In redis-k8s-operator we run an integration test deploying and relating discourse-k8s to Redis. On some CI runs I experienced issues with the discourse charm because it can't write to the Redis database.

The reason for this is that the redis-client tries to connect to the first advertised host it can find in the relation data (here). But since a bug was fixed on redis-k8s-operator side, the relation data now includes all hosts there, not just the one of the Sentinel Master anymore. This was fixed with this PR, which means you now have to consider all advertised hosts, not just anyone anymore.

One more notice: This behaviour only happens if the Sentinel Master is not the unit /0. If, by accident, redis-k8s/0 is the Sentinel Master, or in setups with just one Redis unit, the integration works fine.

To Reproduce

  1. juju deploy redis-k8s --channel edge -n 3
  2. juju deploy postgresql-k8s --channel 14/stable
  3. juju deploy discourse-k8s --channel latest/stable
  4. juju integrate postgresql-k8s:database discourse-k8s
  5. juju integrate redis-k8s discourse-k8s

Environment

see above.

Example for failed CI run: https://github.com/canonical/redis-k8s-operator/actions/runs/9760370147/job/26939124902

Relevant log output

from `juju debug-log`:

unit-discourse-k8s-0: 11:35:26 ERROR unit.discourse-k8s/0.juju-log redis:11: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 805, in <module>
    main(DiscourseCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/lib/charms/redis_k8s/v0/redis.py", line 87, in _on_relation_changed
    self.charm.on.redis_relation_updated.emit()
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 168, in _redis_relation_changed
    self._setup_and_activate()
  File "./src/charm.py", line 221, in _setup_and_activate
    self._configure_pod()
  File "./src/charm.py", line 691, in _configure_pod
    self._config_force_https()
  File "./src/charm.py", line 752, in _config_force_https
    process.wait_output()
  File "/var/lib/juju/agents/unit-discourse-k8s-0/charm/venv/ops/pebble.py", line 1559, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/srv/discourse/app/bin/rails', 'runner', 'SiteSetting.force_https=false'], stdout='', stderr="/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:246:in `call_pipelined': READONLY You can't write against a read only replica. (Redis::CommandError)\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:200:in `block in call_pipeline'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnect'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:198:in `call_pipeline'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis.rb:177:in `block in pipelined'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis.rb:265:in `block in synchronize'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis.rb:265:in `synchronize'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis.rb:265:in `synchronize'\n\tfrom /srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis.rb:17" [truncated]
unit-discourse-k8s-0: 11:35:26 ERROR juju.worker.uniter.operation hook "redis-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1


### Additional context

_No response_
merkata commented 4 months ago

Thanks for reporting, we will tackle it in this cycle, though cannot commit exactly when as of now.

cbartz commented 3 weeks ago

@alithethird Can you confirm this is fixed and close the issue, please?

alithethird commented 3 weeks ago

I can confirm this is fixed and closing.