Open MaisonGF opened 2 days ago
For the time being I created a systemd service with a python script that monitors the docker add-on logs and restart it on a supervised installation :
You can set it up with that bash script (thanks chatgpt) :
#!/bin/bash
# Variables
SERVICE_NAME="monitor_tydom"
SCRIPT_PATH="/usr/local/bin/monitor_tydom.py"
SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service"
# Étape 1 : Créer le script Python
cat << 'EOF' > $SCRIPT_PATH
#!/usr/bin/env python3
import subprocess
import time
# Configuration
SEARCH_TERM = "tydom2mqtt"
ERROR_KEYWORD = "ERROR"
def get_container_name(search_term):
try:
# Trouve le conteneur qui correspond au terme de recherche
result = subprocess.run(
["docker", "ps", "--format", "{{.Names}}"],
stdout=subprocess.PIPE,
text=True,
check=True
)
containers = result.stdout.splitlines()
for container in containers:
if search_term in container:
return container
except subprocess.CalledProcessError as e:
print(f"Error retrieving container list: {e}")
return None
def monitor_logs(container_name):
try:
# Ouvrir un flux continu des logs Docker
process = subprocess.Popen(
["docker", "logs", "-f", container_name],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True
)
for line in process.stdout:
# Vérifie si "ERROR" est dans les logs
if ERROR_KEYWORD in line:
print(f"Error detected: {line.strip()}")
restart_container(container_name)
except Exception as e:
print(f"Exception occurred: {e}")
def restart_container(container_name):
try:
print(f"Restarting container {container_name}...")
subprocess.run(["docker", "restart", container_name], check=True)
print(f"Container {container_name} restarted successfully.")
except subprocess.CalledProcessError as e:
print(f"Failed to restart container: {e}")
if __name__ == "__main__":
container_name = get_container_name(SEARCH_TERM)
if container_name:
print(f"Monitoring logs for container: {container_name}")
while True:
monitor_logs(container_name)
# Pause pour éviter une boucle trop rapide en cas d'erreur
time.sleep(5)
else:
print(f"No container found with search term: {SEARCH_TERM}")
EOF
chmod +x $SCRIPT_PATH
# Étape 2 : Créer le fichier systemd
cat << EOF > $SERVICE_FILE
[Unit]
Description=Monitor Tydom2MQTT Docker logs and restart on errors
After=docker.service
Requires=docker.service
[Service]
ExecStart=$SCRIPT_PATH
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
User=root
[Install]
WantedBy=multi-user.target
EOF
# Étape 3 : Activer le service
systemctl daemon-reload
systemctl enable ${SERVICE_NAME}.service
systemctl start ${SERVICE_NAME}.service
echo "Service ${SERVICE_NAME} créé et démarré avec succès !"
`
I got those errors last night after a brief electricity shortage :
2024-11-16 23:54:18,280 - Starting tydom2mqtt - I restarted manually and no issue after that.
The add-on should crash on error to allow HA to reboot it, or reboot itself, otherwise it's not solid. On the original mrwiwi version the add-on rebooted itself (forever.py rebooted the main script after a crash), IMHO it was a lot more resilient (but it's not working anymore)
Could you please allow the add-on to reboot itself after 2 errors for example ?
Thanks in advance and for the good work !