In some circumstances, when restarting NS8 the systemd unit of the crowdsec module fails to start.
The problem is related to the state of the systemd unit of redis service: if crowdsec finds redis stopped, it cannot complete its configuration provisioning (smarthost) and fails.
Steps to reproduce
Install and configure crowdsec module on NS8
Restart the system
Expected behavior
crowdsec systemd unit should run flawlessly at boot.
Actual behavior
crowdsec systemd unit trys to start but it fails with this output:
May 19 19:32:57 ns crowdsec1[1240]: Traceback (most recent call last):
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 264, in connect
May 19 19:32:57 ns crowdsec1[1240]: sock = self.retry.call_with_retry(
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/retry.py", line 46, in call_with_retry
May 19 19:32:57 ns crowdsec1[1240]: return do()
May 19 19:32:57 ns crowdsec1[1240]: ^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 265, in <lambda>
May 19 19:32:57 ns crowdsec1[1240]: lambda: self._connect(), lambda error: self.disconnect(error)
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 627, in _connect
May 19 19:32:57 ns crowdsec1[1240]: raise err
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 615, in _connect
May 19 19:32:57 ns crowdsec1[1240]: sock.connect(socket_address)
May 19 19:32:57 ns crowdsec1[1240]: ConnectionRefusedError: [Errno 111] Connection refused
May 19 19:32:57 ns crowdsec1[1240]: During handling of the above exception, another exception occurred:
May 19 19:32:57 ns crowdsec1[1240]: Traceback (most recent call last):
May 19 19:32:57 ns crowdsec1[1240]: File "/var/lib/nethserver/crowdsec1/bin/expand-configuration", line 82, in <module>
May 19 19:32:57 ns crowdsec1[1240]: smtp = agent.get_smarthost_settings(agent.redis_connect())
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pypkg/agent/__init__.py", line 509, in get_smarthost_settings
May 19 19:32:57 ns crowdsec1[1240]: conf = rdb.hgetall('cluster/smarthost')
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/commands/core.py", line 4954, in hgetall
May 19 19:32:57 ns crowdsec1[1240]: return self.execute_command("HGETALL", name)
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/client.py", line 533, in execute_command
May 19 19:32:57 ns crowdsec1[1240]: conn = self.connection or pool.get_connection(command_name, **options)
May 19 19:32:57 ns crowdsec1[1240]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 1086, in get_connection
May 19 19:32:57 ns crowdsec1[1240]: connection.connect()
May 19 19:32:57 ns crowdsec1[1240]: File "/usr/local/agent/pyenv/lib64/python3.11/site-packages/redis/connection.py", line 270, in connect
May 19 19:32:57 ns crowdsec1[1240]: raise ConnectionError(self._error_message(e))
May 19 19:32:57 ns crowdsec1[1240]: redis.exceptions.ConnectionError: Error 111 connecting to cluster-leader:6379. Connection refused.
...
May 19 19:32:58 ns systemd[1]: crowdsec1.service: Control process exited, code=exited, status=1/FAILURE
After 5 attempts done quickly, the start of the unit definitively fails:
May 19 19:33:01 ns systemd[1]: crowdsec1.service: Control process exited, code=exited, status=1/FAILURE
May 19 19:33:01 ns systemd[1]: crowdsec1.service: Failed with result 'exit-code'.
May 19 19:33:01 ns systemd[1]: Failed to start crowdsec server.
May 19 19:33:01 ns systemd[1]: crowdsec1.service: Scheduled restart job, restart counter is at 5.
May 19 19:33:01 ns systemd[1]: Stopped crowdsec server.
May 19 19:33:01 ns systemd[1]: crowdsec1.service: Start request repeated too quickly.
May 19 19:33:01 ns systemd[1]: crowdsec1.service: Failed with result 'exit-code'.
May 19 19:33:01 ns systemd[1]: Failed to start crowdsec server.
A few seconds later, the startup of redis is traced in the log:
May 19 19:33:13 ns redis[2284]: 1:C 19 May 2024 19:33:13.375 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
May 19 19:33:13 ns redis[2284]: 1:C 19 May 2024 19:33:13.375 * Redis version=7.2.3, bits=64, commit=00000000, modified=0, pid=1, just started
May 19 19:33:13 ns redis[2284]: 1:C 19 May 2024 19:33:13.375 * Configuration loaded
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.376 * monotonic clock: POSIX clock_gettime
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.377 * Running mode=standalone, port=6379.
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.379 * Server initialized
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.380 * Loading RDB produced by version 7.2.3
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.380 * RDB age 160 seconds
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.380 * RDB memory usage when created 3.14 Mb
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.382 * Done loading RDB, keys loaded: 544, keys expired: 0.
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.382 * DB loaded from disk: 0.003 seconds
May 19 19:33:13 ns redis[2284]: 1:M 19 May 2024 19:33:13.382 * Ready to accept connections tcp
Adding to crowdsec unit a dependence to redis.service workarounded the issue:
~]# systemctl cat crowdsec1.service
# /etc/systemd/system/crowdsec1.service
#
# Copyright (C) 2022 Nethesis S.r.l.
# SPDX-License-Identifier: GPL-3.0-or-later
#
#
# This systemd unit starts a crowdsec instance using Podman.
# Most parts of this file come from podman-generate-systemd.
#
[Unit]
Description=crowdsec server
After=network.target redis.service
In some circumstances, when restarting NS8 the systemd unit of the
crowdsec
module fails to start. The problem is related to the state of the systemd unit ofredis
service: ifcrowdsec
findsredis
stopped, it cannot complete its configuration provisioning (smarthost) and fails.Steps to reproduce
crowdsec
module on NS8Expected behavior
crowdsec
systemd unit should run flawlessly at boot.Actual behavior
crowdsec
systemd unit trys to start but it fails with this output:After 5 attempts done quickly, the start of the unit definitively fails:
A few seconds later, the startup of
redis
is traced in the log:Adding to
crowdsec
unit a dependence toredis.service
workarounded the issue:Components