Boerderij / Varken

Standalone application to aggregate data from the Plex ecosystem into InfluxDB using Grafana as a frontend
MIT License
1.16k stars 136 forks source link

Varken docker fails to connect to influx with critical error but returns with exit code 0 #174

Closed n1nj4888 closed 3 years ago

n1nj4888 commented 4 years ago

Hi there,

When the varken container starts before InfluxDB is ready and cannot contact InfluxDB, varken fails with the following log entries:

2020-04-09 09:17:02 : INFO : Varken : Starting Varken... 2020-04-09 09:17:02 : INFO : Varken : Data folder is "/config" 2020-04-09 09:17:02 : INFO : Varken : Linux 5.3.0-46-generic (#38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 - Alpine Linux 3.10.0) 2020-04-09 09:17:02 : INFO : Varken : Python 3.7.3 (default, Jun 27 2019, 22:53:21) [GCC 8.3.0] 2020-04-09 09:17:02 : INFO : Varken : Varken v1.7.6-master 2020-04-09 09:17:02 : INFO : helpers : SONARR_SERVER_IDS : [1] 2020-04-09 09:17:02 : INFO : helpers : RADARR_SERVER_IDS : [1] 2020-04-09 09:17:02 : INFO : iniparser : LIDARR_SERVER_IDS disabled. 2020-04-09 09:17:02 : INFO : iniparser : OMBI_SERVER_IDS disabled. 2020-04-09 09:17:02 : INFO : helpers : TAUTULLI_SERVER_IDS : [1, 2] 2020-04-09 09:17:02 : INFO : iniparser : SICKCHILL_SERVER_IDS disabled. 2020-04-09 09:17:02 : INFO : iniparser : UNIFI_SERVER_IDS disabled. 2020-04-09 09:17:02 : CRITICAL : dbmanager : Error testing connection to InfluxDB. Please check your url/hostname

Although the error was deemed "CRITICAL", the container exits with an error code of 0 as per the following portainer Inspect details on the stopped container:

State Dead false Error ExitCode 0 FinishedAt 2020-04-09T01:17:02.371494764Z OOMKilled false Paused false Pid 0 Restarting false Running false StartedAt 2020-04-09T01:17:01.096798522Z Status exited

The issue here is that I believe an ExitCode 0 is not marked as a failure and therefore if the varken service is setup with a swarm restart_policy of "on-failure", the docker swarm managers will not attempt to restart the container ...

samwiseg0 commented 4 years ago

This is intended as influx is critical for the operation. Please ensure that influx is started before varken.

n1nj4888 commented 4 years ago

I understand that influx is critical for varken but consider the scenario where the physical Docker node reboots. Influx container takes FAR longer to start than varken so varken simply tries once and then exits with an exit code of 0 (ok)... since it exits with exit code 0, the Docker engine does not attempt to restart the container.

The issue is that varken should be exiting with a non-zero exit code if it exits with a critical error so that the Docker engine/swarm orchestrator can action it accordingly. This is how many other containers work...

dirtycajunrice commented 4 years ago

@samwiseg0 He is right about the exit code. Offending line: https://github.com/Boerderij/Varken/blob/master/varken/dbmanager.py#L23 Documentation: https://docs.python.org/3/library/sys.html#sys.exit Relevant snippet from documentation:

The optional argument arg can be an integer giving the exit status (defaulting to zero)

Proposed resolution: exit(1)

samwiseg0 commented 4 years ago

Yep. I will look at fixing it in develop

n1nj4888 commented 4 years ago

Hi @samwiseg0. Any update on this issue since it doesn’t seem to have been fixed yet in develop? Thanks!