Open XoniBlue opened 4 months ago
what do the (web & collector) logs say?
Hi same here, no disks displayed, smartctl see them when i run the command on the docker container.
I don't know if it's related, but for me I recently setup something similar.
networks:
monitoring: # A common network for all monitoring services to communicate into
notifications: # To Gotify or another Notification service
services:
influxdb:
container_name: influxdb
image: influxdb:2.7-alpine
ports:
- 8086:8086
volumes:
- ${DIR_CONFIG}/influxdb2/db:/var/lib/influxdb2
- ${DIR_CONFIG}/influxdb2/config:/etc/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=Admin
- DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD}
- DOCKER_INFLUXDB_INIT_ORG=homelab
- DOCKER_INFLUXDB_INIT_BUCKET=scrutiny
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=REDACTED
- TZ=Europe/Stockholm
restart: unless-stopped
networks:
- monitoring
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8086/health"]
interval: 5s
timeout: 10s
retries: 20
scrutiny:
container_name: scrutiny
image: ghcr.io/analogj/scrutiny:master-web
ports:
- 8080:8080
volumes:
- ${DIR_CONFIG}/config:/opt/scrutiny/config
environment:
- SCRUTINY_WEB_INFLUXDB_HOST=influxdb
- SCRUTINY_WEB_INFLUXDB_PORT=8086
- SCRUTINY_WEB_INFLUXDB_TOKEN=REDACTED
- SCRUTINY_WEB_INFLUXDB_ORG=homelab
- SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
# Optional but highly recommended to notify you in case of a problem
- SCRUTINY_NOTIFY_URLS=REDACTED
- TZ=Europe/Stockholm
depends_on:
influxdb:
condition: service_healthy
restart: unless-stopped
networks:
- notifications
- monitoring
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
interval: 5s
timeout: 10s
retries: 20
start_period: 10s
The Scrutiny container just dies randomly, which in turn makes it impossible for the collectors (in the same local network) to report anything to the Web. Health Check doesn't help either since it doesn't restart the container. I tried using Influx 2.1, 2.2 and now lately 2.7. No difference - and the DB is always stable it seems.
Ubuntu 24.04
(running rootless)
ii docker-buildx-plugin 0.16.2-1~ubuntu.24.04~noble amd64 Docker Buildx cli plugin.
ii docker-ce 5:27.2.0-1~ubuntu.24.04~noble amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:27.2.0-1~ubuntu.24.04~noble amd64 Docker CLI: the open-source application container engine
ii docker-ce-rootless-extras 5:27.2.0-1~ubuntu.24.04~noble amd64 Rootless support for Docker.
ii docker-compose-plugin 2.29.2-1~ubuntu.24.04~noble amd64 Docker Compose (V2) plugin for the Docker CLI.
0.8.1 (master-web)
2.7 (also tested 2.1 and 2.2)
user@scrutiny:~/scrutiny$ ls -l
total 12
drwxr-xr-x 2 user user 4096 Sep 8 16:37 config
-rw-rw-r-- 1 user user 2361 Sep 8 10:36 docker-compose.yaml
drwxr-xr-x 4 user user 4096 Sep 7 22:58 influxdb2
Now it seems to have been running fine. Will get back with some logs in case it happens again.
OK, so here are some logs:
time="2024-09-08T18:45:32+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T18:45:32+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:18:45:32 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (2ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=2 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
2024/09/08 22:35:58 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
___ ___ ____ __ __ ____ ____ _ _ _ _
/ __) / __)( _ \( )( )(_ _)(_ _)( \( )( \/ )
\__ \( (__ ) / )(__)( )( _)(_ ) ( \ /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny dev-0.8.1
Start the scrutiny server
time="2024-09-08T22:35:58+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:58+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:58+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
panic: failed to check influxdb setup status - Get "http://influxdb:8086/api/v2/setup": dial tcp 172.19.0.2:8086: connect: connection refused
goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0x11c62a8?, 0xc000014d90?}, {0x11ca9b0?, 0xc000407880?})
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:15 +0xd6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000013320, 0x1044847?)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xa5
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000013320)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:82 +0x12c
main.main.func2(0xc000313f40)
/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:133 +0x39c
github.com/urfave/cli/v2.(*Command).Run(0xc0003d5200, 0xc000313dc0)
/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/command.go:164 +0x5c8
github.com/urfave/cli/v2.(*App).RunContext(0xc0002d6600, {0x11bd6c8?, 0xc000046040}, {0xc000036040, 0x2, 0x2})
/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:306 +0xbac
github.com/urfave/cli/v2.(*App).Run(...)
/go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:215
main.main()
/go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:158 +0x774
___ ___ ____ __ __ ____ ____ _ _ _ _
/ __) / __)( _ \( )( )(_ _)(_ _)( \( )( \/ )
\__ \( (__ ) / )(__)( )( _)(_ ) ( \ /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny dev-0.8.1
Start the scrutiny server
2024/09/08 22:35:59 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
time="2024-09-08T22:35:59+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Database migration starting. Please wait, this process may take a long time...." type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Database migration completed successfully" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="SQLite global configuration migrations starting. Please wait...." type=web
time="2024-09-08T22:35:59+02:00" level=info msg="SQLite global configuration migrations completed successfully" type=web
time="2024-09-08T22:36:04+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:04+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:04 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:10+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:10+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:10 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (4ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=4 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:15+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:15+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:15 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:20+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:20+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:20 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (2ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=2 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:25+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:25+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:25 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:30+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:30+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:30 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:35+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:35+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:35 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
ts=2024-09-08T16:11:55.098457Z lvl=info msg="Pruning shard groups after retention check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=end op_elapsed=0.135ms
ts=2024-09-08T16:11:55.098522Z lvl=info msg="Retention policy deletion check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=end op_elapsed=0.502ms
ts=2024-09-08T16:20:13.098918Z lvl=info msg="Cache snapshot (start)" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
ts=2024-09-08T16:20:13.147114Z lvl=info msg="Snapshot for path written" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1 duration=48.299ms
ts=2024-09-08T16:20:13.147214Z lvl=info msg="Cache snapshot (end)" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=48.402ms
ts=2024-09-08T16:41:55.099150Z lvl=info msg="Retention policy deletion check (start)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=start
ts=2024-09-08T16:41:55.099452Z lvl=info msg="Pruning shard groups after retention check (start)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=start
ts=2024-09-08T16:41:55.099550Z lvl=info msg="Pruning shard groups after retention check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=end op_elapsed=0.114ms
ts=2024-09-08T16:41:55.099706Z lvl=info msg="Retention policy deletion check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=end op_elapsed=0.693ms
2024-09-08T20:35:58. info found existing boltdb file, skipping setup wrapper {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
2024-09-08T20:35:58. info found existing boltdb file, skipping setup wrapper {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
ts=2024-09-08T20:35:59.350614Z lvl=info msg="Welcome to InfluxDB" log_id=0rWqSWyG000 version=v2.7.10 commit=f302d9730c build_date=2024-08-16T20:19:28Z log_level=info
ts=2024-09-08T20:35:59.350859Z lvl=warn msg="nats-port argument is deprecated and unused" log_id=0rWqSWyG000
ts=2024-09-08T20:35:59.355792Z lvl=info msg="Resources opened" log_id=0rWqSWyG000 service=bolt path=/var/lib/influxdb2/influxd.bolt
ts=2024-09-08T20:35:59.356016Z lvl=info msg="Resources opened" log_id=0rWqSWyG000 service=sqlite path=/var/lib/influxdb2/influxd.sqlite
ts=2024-09-08T20:35:59.374536Z lvl=info msg="Checking InfluxDB metadata for prior version." log_id=0rWqSWyG000 bolt_path=/var/lib/influxdb2/influxd.bolt
ts=2024-09-08T20:35:59.375041Z lvl=info msg="Using data dir" log_id=0rWqSWyG000 service=storage-engine service=store path=/var/lib/influxdb2/engine/data
ts=2024-09-08T20:35:59.375103Z lvl=info msg="Compaction settings" log_id=0rWqSWyG000 service=storage-engine service=store max_concurrent_compactions=1 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648
ts=2024-09-08T20:35:59.375125Z lvl=info msg="Open store (start)" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open op_event=start
ts=2024-09-08T20:35:59.433280Z lvl=info msg="index opened with 8 partitions" log_id=0rWqSWyG000 service=storage-engine index=tsi
ts=2024-09-08T20:35:59.439619Z lvl=info msg="loading changes (start)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=start
ts=2024-09-08T20:35:59.439798Z lvl=info msg="loading changes (end)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=end op_elapsed=0.166ms
ts=2024-09-08T20:35:59.440515Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/d2cb8fefbb0e6d58/autogen/2/000000001-000000001.tsm id=0 duration=0.283ms
ts=2024-09-08T20:35:59.440864Z lvl=info msg="Opened shard" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/d2cb8fefbb0e6d58/autogen/2 duration=58.126ms
ts=2024-09-08T20:35:59.466214Z lvl=info msg="index opened with 8 partitions" log_id=0rWqSWyG000 service=storage-engine index=tsi
ts=2024-09-08T20:35:59.469318Z lvl=info msg="loading changes (start)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=start
ts=2024-09-08T20:35:59.469464Z lvl=info msg="loading changes (end)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=end op_elapsed=0.205ms
ts=2024-09-08T20:35:59.472423Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000005-000000002.tsm id=0 duration=1.503ms
ts=2024-09-08T20:35:59.477915Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000014-000000002.tsm id=1 duration=4.616ms
ts=2024-09-08T20:35:59.478595Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000017-000000001.tsm id=4 duration=0.372ms
ts=2024-09-08T20:35:59.480224Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000018-000000001.tsm id=5 duration=1.487ms
ts=2024-09-08T20:35:59.480700Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000015-000000001.tsm id=2 duration=0.348ms
ts=2024-09-08T20:35:59.476600Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000016-000000001.tsm id=3 duration=1.608ms
ts=2024-09-08T20:35:59.481277Z lvl=info msg="Reading file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=cacheloader path=/var/lib/influxdb2/engine/wal/fd7bf29911cfbc2a/autogen/1/_00013.wal size=14266
ts=2024-09-08T20:35:59.487029Z lvl=info msg="Opened shard" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1 duration=88.776ms
ts=2024-09-08T20:35:59.487738Z lvl=info msg="Open store (end)" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open op_event=end op_elapsed=112.614ms
ts=2024-09-08T20:35:59.489881Z lvl=info msg="Starting retention policy enforcement service" log_id=0rWqSWyG000 service=retention check_interval=30m
ts=2024-09-08T20:35:59.490018Z lvl=info msg="Starting precreation service" log_id=0rWqSWyG000 service=shard-precreation check_interval=10m advance_period=30m
ts=2024-09-08T20:35:59.496753Z lvl=info msg="Starting query controller" log_id=0rWqSWyG000 service=storage-reads concurrency_quota=1024 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=1024
ts=2024-09-08T20:35:59.614105Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0rWqSWyG000 max_select_point=0 max_select_series=0 max_select_buckets=0
ts=2024-09-08T20:35:59.630557Z lvl=info msg=Starting log_id=0rWqSWyG000 service=telemetry interval=8h
ts=2024-09-08T20:35:59.633322Z lvl=info msg=Listening log_id=0rWqSWyG000 service=tcp-listener transport=http addr=:8086 port=8086
This happens when I try to connect to the web after being disconnected for a while. @AnalogJ do you need anything else to debug? Made a donation through PayPal)
OK, so I think I found it! https://forums.docker.com/t/linux-rootless-docker-only-starting-after-user-login/141505/2
It's because I run rootless
docker.
I'm also running rootless docker and came up with a work around assuming we trust this program to run as root on the host. Interested in hearing about improvements to this method to lock it down as much as possible (different user with just smartctl and http access?).
Set up Hub/Spoke deployment with just influx and the web UI. Expose the direct web UI port only to localhost.
services:
influxdb:
image: influxdb:2.2
volumes:
- ./influxdb:/var/lib/influxdb2
- ./influxconfig:/etc/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=Admin
- DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD}
- DOCKER_INFLUXDB_INIT_ORG=homelab
- DOCKER_INFLUXDB_INIT_BUCKET=scrutiny
- DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=${TOKEN}
restart: unless-stopped
networks:
- scrutiny
scrutiny:
container_name: scrutiny
image: ghcr.io/analogj/scrutiny:master-web
ports:
- 127.0.0.1:8901:8080
environment:
- SCRUTINY_WEB_INFLUXDB_HOST=influxdb
- SCRUTINY_WEB_INFLUXDB_PORT=8086
- SCRUTINY_WEB_INFLUXDB_TOKEN=${TOKEN}
- SCRUTINY_WEB_INFLUXDB_ORG=homelab
- SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
depends_on:
- influxdb
volumes:
- ./config:/opt/scrutiny/config
restart: unless-stopped
networks:
- scrutiny
- npm-network
networks:
scrutiny: null
npm-network:
external: true
Install collector on host system as described here: https://github.com/AnalogJ/scrutiny/blob/master/docs/INSTALL_MANUAL.md#collector
Set up the cronjob as root:
*/15 * * * * . /etc/profile; /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://localhost:8901"
Hey all I need some help, Ive got Scrutiny Master-Web and Master-Collector with influxdb installed. Now it took some time to get everything working with the influxdb bucket/org/token, but all is well, scrutiny has created the extra buckets, web ui launches. BUT Ive got no drives showing up at all in the web ui. I ran the smartctl scan inside and out of docker, and it did infact show the drives. Below in the yml for scrutiny and collector. Please any help is appreciated.