AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.4k stars 172 forks source link

[BUG] Race condition in startup #695

Closed bjornfor closed 2 months ago

bjornfor commented 2 months ago

Describe the bug Scrutiny sometimes fails to start, due to what looks like a race condition:

22:28:27 srv1 systemd[1]: Started Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds.
22:28:27 srv1 scrutiny[2380]: 2024/09/20 22:28:27 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
22:28:27 srv1 scrutiny[2380]:  ___   ___  ____  __  __  ____  ____  _  _  _  _
22:28:27 srv1 scrutiny[2380]: / __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
22:28:27 srv1 scrutiny[2380]: \__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
22:28:27 srv1 scrutiny[2380]: (___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
22:28:27 srv1 scrutiny[2380]: github.com/AnalogJ/scrutiny                             dev-0.8.1
22:28:27 srv1 scrutiny[2380]: Start the scrutiny server
22:28:27 srv1 scrutiny[2380]: 2024/09/20 22:28:27 Loading configuration file: /nix/store/yyq8crcc4acshpkb66kigdwwh0afhyaa-scrutiny.yaml
22:28:27 srv1 scrutiny[2380]: time="2024-09-20T22:28:27+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /var/lib/scrutiny/scrutiny.db\n" type=web
22:28:27 srv1 scrutiny[2380]: time="2024-09-20T22:28:27+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /var/lib/scrutiny/scrutiny.db\n" type=web
22:28:27 srv1 scrutiny[2380]: time="2024-09-20T22:28:27+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
22:28:27 srv1 scrutiny[2380]: panic: failed to check influxdb setup status - Get "http://0.0.0.0:8086/api/v2/setup": dial tcp 0.0.0.0:8086: connect: connection refused
22:28:27 srv1 scrutiny[2380]: goroutine 1 [running]:
22:28:27 srv1 scrutiny[2380]: github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0x11abba8?, 0xc000092200?}, {0x11b0250?, 0xc00038fce0?})
22:28:27 srv1 scrutiny[2380]:         github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:15 +0xcb
22:28:27 srv1 scrutiny[2380]: github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc00019a360, 0xc00038fce0)
22:28:27 srv1 scrutiny[2380]:         github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0x99
22:28:27 srv1 scrutiny[2380]: github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc00019a360)
22:28:27 srv1 scrutiny[2380]:         github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:82 +0x13e
22:28:27 srv1 scrutiny[2380]: main.main.func2(0xc000766900)
22:28:27 srv1 scrutiny[2380]:         github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:133 +0x36b
22:28:27 srv1 scrutiny[2380]: github.com/urfave/cli/v2.(*Command).Run(0xc0005a18c0, 0xc0005ade80)
22:28:27 srv1 scrutiny[2380]:         github.com/urfave/cli/v2@v2.2.0/command.go:164 +0x583
22:28:27 srv1 scrutiny[2380]: github.com/urfave/cli/v2.(*App).RunContext(0xc0001baf00, {0x11a1e60, 0x18f2c00}, {0xc000034080, 0x4, 0x4})
22:28:27 srv1 scrutiny[2380]:         github.com/urfave/cli/v2@v2.2.0/app.go:306 +0xb4c
22:28:27 srv1 scrutiny[2380]: github.com/urfave/cli/v2.(*App).Run(...)
22:28:27 srv1 scrutiny[2380]:         github.com/urfave/cli/v2@v2.2.0/app.go:215
22:28:27 srv1 scrutiny[2380]: main.main()
22:28:27 srv1 scrutiny[2380]:         github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:158 +0x794
22:28:27 srv1 systemd[1]: scrutiny.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
22:28:27 srv1 systemd[1]: scrutiny.service: Failed with result 'exit-code'.

With retries enabled in the systemd service definitition, Scrutiny succeeds startup after a few tries.

Tested on NixOS.

Expected behavior Scrutiny starts up successfully every time.

0-a-9-5-6 commented 2 months ago

This repo doesn't include any systemd code, so this is probably a nixpkgs issue, which is already filed as https://github.com/NixOS/nixpkgs/issues/317017

bjornfor commented 2 months ago

Oops, sorry for not realizing that, and thanks for pointing me in the right direction!