The Graylog Sidecar systemctl is displaying wrong status.

hyderaliva commented 2 months ago

Problem description

The Graylog Sidecar systemctlstatus always shows 'running,' even when there is a connectivity issue between the Graylog API server and the Graylog Sidecar agent. As a result, we lose critical event logs in the Graylog web console. We currently use systemctl status to monitor the Graylog Sidecar agent, but this approach seems ineffective.

Please suggest a suitable method to monitor graylog-sidecar agent, ensuring issues are addressed promptly and critical events are not missed in the Graylog console.

The graylog-sidecar config and systemctl files are as follows,

sidecar.yml

server_url: "http://graylog-server-ip:9000/api" server_api_token: "api-token" collector_id: "file:/etc/graylog/sidecar/node-id" node_id: "file:/etc/graylog/sidecar/node-id" node_name: "$HOSTNAME" send_status: true cache_path: "/var/cache/graylog-sidecar" log_path: "/var/log/graylog-sidecar" list_log_files: "/var/log/"

graylog-sidecar.service `[Unit] Description=Wrapper service for Graylog controlled collector ConditionFileIsExecutable=/usr/bin/graylog-sidecar

[Service] StartLimitInterval=5 StartLimitBurst=10 ExecStart=/usr/bin/graylog-sidecar Restart=always RestartSec=120 EnvironmentFile=-/etc/sysconfig/graylog-sidecar

[Install] WantedBy=multi-user.target`

Environment

Sidecar Version: 1.0.0
Graylog Version: 4.1.14
Operating System: Ubuntu 20.04
Elasticsearch Version: 7.10.2
MongoDB Version:4.2

Thanks, Hyder

sethgraylog commented 2 months ago

Can you validate this on a current version of Graylog (v6.0)?

drewmiranda-gl commented 1 month ago

Took a quick look at this and can confirm that the graylog-sidecar service does stay in an active (running) state even if it loses connectivity with its Graylog cluster. This appears to be working as designed though as the service itself remains running so that it can continue to retry communication with its Graylog cluster.

I suggest 2 things to monitor graylog cluster health:

Use an uptime or monitoring software solution to alert on connectivity issues with the Graylog cluster, specifically the Graylog web interface
Use the contents of sidecar's log file, /var/log/graylog-sidecar/sidecar.log (for example a script running as a cron job to check for errors), when sidecar is unable to connect to its graylog cluster it logs the following:
- `level=error msg="Error fetching server version Get \"\": dial tcp : connect: connection refused""

Graylog2 / collector-sidecar

The Graylog Sidecar systemctl is displaying wrong status. #500

Problem description

Environment