canonical / redis-k8s-operator

Operator Charm for Redis
Apache License 2.0
2 stars 10 forks source link

redis-server does not start on Canonical Kubernetes #90

Closed faebd7 closed 2 days ago

faebd7 commented 1 week ago

Hi,

I noticed that redis-k8s does not work on Canonical Kubernetes (channel 1.30/beta, revision 64).

Model           Controller                      Cloud/Region            Version  SLA          Timestamp         
stg-netbox-k8s  juju-controller-34-staging-ps6  stg-netbox-k8s/default  3.4.2    unsupported  21:13:30Z         

App        Version  Status   Scale  Charm      Channel        Rev  Address         Exposed  Message             
redis-k8s           waiting      1  redis-k8s  latest/edge     32  10.152.183.100  no       installing agent    

Unit          Workload  Agent  Address     Ports  Message                                                               
redis-k8s/0*  error     idle   10.1.1.63          hook failed: "storage-attached"                                       

Digging into the deployment, I found this:

$ kubectl exec -t -n stg-netbox-k8s redis-k8s-0  -c redis -- head /var/log/redis/redis-server.log
13:C 01 Jul 2024 05:15:06.233 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
13:C 01 Jul 2024 05:15:06.233 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=13, just started
13:C 01 Jul 2024 05:15:06.233 * Configuration loaded
13:M 01 Jul 2024 05:15:06.233 * Increased maximum number of open files to 10032 (it was originally set to 1024).
13:M 01 Jul 2024 05:15:06.233 * monotonic clock: POSIX clock_gettime
13:M 01 Jul 2024 05:15:06.234 * Running mode=standalone, port=6379.
13:M 01 Jul 2024 05:15:06.234 * Server initialized
13:M 01 Jul 2024 05:15:06.240 # Can't open or create append-only dir appendonlydir: Permission denied
18:C 01 Jul 2024 05:15:06.839 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=18, just started
$ _

When the Juju storage is provisioned on a Canonical Kubernetes cluster, the permissions of /var/lib/redis are as follows:

root@redis-k8s-0:/# ls -ld /var/lib/redis/
drwxr-xr-x 4 root root 4096 Jul  2 01:58 /var/lib/redis/
root@redis-k8s-0:/# _

This does not work, because the pebble plan runs redis-server as the redis user.

When deploying on microk8s, the permissions of the provisioned storage are as follows:

root@redis-k8s-0:/# ls -ld /var/lib/redis/
drwxrwxrwx 3 root root 4096 Jul  2 01:36 /var/lib/redis/
root@redis-k8s-0:/# _

and so redis-server is able create the files and directories it needs:

root@redis-k8s-0:/# ls -l /var/lib/redis/
total 16
drwxr-xr-x 2 redis redis 4096 Jul  2 01:36 appendonlydir
-rw------- 1 redis redis 1895 Jul  2 01:36 ca.crt
-rw------- 1 redis redis 1407 Jul  2 01:36 redis.crt
-rw------- 1 redis redis 1679 Jul  2 01:36 redis.key
root@redis-k8s-0:/# _

This difference most likely arises because microk8s uses microk8s.io/hostpath-provisioner, whereas Canonical Kubernetes uses rawfile.csi.openebs.io.

The latter's defaults are more sensible, and probably closer to what e.g. the OpenStack Cinder provisioner does.

I think it would make sense for the charm to ensure that /var/lib/redis is owned by the correct user and group before redis-server becomes startable.

github-actions[bot] commented 1 week ago

https://warthogs.atlassian.net/browse/DPE-4808

reneradoi commented 5 days ago

Hi @faebd7 thank you for reporting the issue.

I have adjusted the user and directory setup in the rock that is used by this charm (see https://github.com/canonical/charmed-redis-rock/pull/7), so hopefully this should work now. Revision 33 of the redis-k8s-operator includes the fix.

Please let us know if other issues arise.

faebd7 commented 5 days ago

The problem is still present in revision 33.

The error below from redis-k8s/0 appears to be due to a separate problem that I am currently investigating. The new unit redis-k8s/1 was created after I ran juju refresh redis-k8s.

stg-netbox@is-bastion-ps6:~$ juju status -m admin/stg-netbox-k8s redis-k8s
Model           Controller                      Cloud/Region            Version  SLA          Timestamp
stg-netbox-k8s  juju-controller-34-staging-ps6  stg-netbox-k8s/default  3.4.2    unsupported  22:01:26Z

App        Version  Status   Scale  Charm      Channel      Rev  Address         Exposed  Message
redis-k8s           waiting      2  redis-k8s  latest/edge   33  10.152.183.100  no       waiting for units to settle down

Unit          Workload  Agent  Address     Ports  Message
redis-k8s/0*  error     idle   10.1.1.164         hook failed: "config-changed"
redis-k8s/1   error     idle   10.1.0.224         hook failed: "config-changed"
stg-netbox@is-bastion-ps6:~$ _
stg-netbox@is-bastion-ps6:~$ juju debug-log -m admin/stg-netbox-k8s  --include redis-k8s/1
unit-redis-k8s-1: 21:57:35 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-redis-k8s-1: 21:57:35 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-redis-k8s-1: 22:02:25 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-redis-k8s-1: 22:02:35 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-redis-k8s-1: 22:02:35 WARNING unit.redis-k8s/1.juju-log 2 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
unit-redis-k8s-1: 22:02:35 WARNING unit.redis-k8s/1.juju-log DEPRECATION WARNING - password off, this will be removed on later versions
unit-redis-k8s-1: 22:02:35 INFO unit.redis-k8s/1.juju-log Added updated layer 'redis' to Pebble plan
unit-redis-k8s-1: 22:02:35 ERROR unit.redis-k8s/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/./src/charm.py", line 725, in <module>
    main(RedisK8sCharm)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/./src/charm.py", line 211, in _config_changed
    self._update_layer()
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/./src/charm.py", line 354, in _update_layer
    container.restart("redis", "redis_exporter")
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/model.py", line 1893, in restart
    self._pebble.restart_services(service_names)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/pebble.py", line 1638, in restart_services
    return self._services_action('restart', services, timeout, delay)
  File "/var/lib/juju/agents/unit-redis-k8s-1/charm/venv/ops/pebble.py", line 1659, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "redis" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2024-07-04T22:02:35Z INFO Service "redis" has never been started.
----- Logs from task 1 -----
2024-07-04T22:02:35Z INFO Service "redis_exporter" has never been started.
----- Logs from task 2 -----
2024-07-04T22:02:35Z INFO Most recent service output:

2024-07-04T22:02:35Z ERROR cannot start service: exited quickly with code 1
-----
unit-redis-k8s-1: 22:02:35 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-redis-k8s-1: 22:02:35 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
^C
stg-netbox@is-bastion-ps6:~$ _
root@redis-k8s-1:/# head /var/log/redis/redis-server.log 
12:C 04 Jul 2024 21:52:11.069 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
12:C 04 Jul 2024 21:52:11.069 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=12, just started
12:C 04 Jul 2024 21:52:11.069 * Configuration loaded
12:S 04 Jul 2024 21:52:11.070 * Increased maximum number of open files to 10032 (it was originally set to 1024).
12:S 04 Jul 2024 21:52:11.070 * monotonic clock: POSIX clock_gettime
12:S 04 Jul 2024 21:52:11.070 * Running mode=standalone, port=6379.
12:S 04 Jul 2024 21:52:11.070 * Server initialized
12:S 04 Jul 2024 21:52:11.070 # Can't open or create append-only dir appendonlydir: Permission denied
17:C 04 Jul 2024 21:52:11.904 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
17:C 04 Jul 2024 21:52:11.904 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=17, just started
root@redis-k8s-1:/# ls -ld /var/lib/redis/
drwxr-xr-x 3 root root 4096 Jul  4 21:52 /var/lib/redis/
root@redis-k8s-1:/# ls -l /var/lib/redis/
total 28
-rw------- 1 redis redis  1895 Jul  4 21:52 ca.crt
drwx------ 2 root  root  16384 Jul  4 21:51 lost+found
-rw------- 1 redis redis  1407 Jul  4 21:52 redis.crt
-rw------- 1 redis redis  1679 Jul  4 21:52 redis.key
root@redis-k8s-1:/# _

The Juju-created storage volume is mounted on top of the image's own /var/lib/redis and so the chown during image build has no effect on the final state of the deployment.

mthaddon commented 4 days ago

Reopening the issue, per the above comment.

reneradoi commented 4 days ago

Hi @faebd7 thank you for the feedback. I've now changed the charm itself to make sure the required directories are there, so hopefully this time it works. The new revision #34 has been published to charmhub. Please let me know if it works.

faebd7 commented 2 days ago

@reneradoi Looks good in my testing, thank you!