AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.42k stars 172 forks source link

[BUG] Nothing showing up in web ui #660

Open XoniBlue opened 4 months ago

XoniBlue commented 4 months ago

Hey all I need some help, Ive got Scrutiny Master-Web and Master-Collector with influxdb installed. Now it took some time to get everything working with the influxdb bucket/org/token, but all is well, scrutiny has created the extra buckets, web ui launches. BUT Ive got no drives showing up at all in the web ui. I ran the smartctl scan inside and out of docker, and it did infact show the drives. Below in the yml for scrutiny and collector. Please any help is appreciated.

services:
  scrutiny:
    image: ghcr.io/analogj/scrutiny:master-web
    container_name: scrutiny
    security_opt:
      - no-new-privileges:true
    environment:
     # - SCRUTINY_API_ENDPOINT=http://localhost:8080
      - SCRUTINY_WEB_INFLUXDB_HOST=192.168.5.105
      - SCRUTINY_WEB_INFLUXDB_PORT=8086
      - SCRUTINY_WEB_INFLUXDB_TOKEN=##INFLUX TOKEN##
      - SCRUTINY_WEB_INFLUXDB_ORG=scrutiny
      - SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
      - DEBUG=true
      # Optional but highly recommended to notify you in case of a problem
      #- SCRUTINY_NOTIFY_URLS=["http://gotify:80/message?token=a-gotify-token"]
    restart: unless-stopped
    #cap_add:
    #  - SYS_RAWIO
    #  - SYS_ADMIN #optional
    #secrets:
    #  - scrutiny_token
    networks:
      t3_proxy:
        ipv4_address: 192.168.5.120 # You can specify a static IP
      socket_proxy:
    volumes:
      - $DOCKERDIR/appdata/scrutiny:/opt/scrutiny/config
    labels:
      - "traefik.enable=true"
      # HTTP Routers
      - "traefik.http.routers.scrutiny-rtr.entrypoints=websecure"
      - "traefik.http.routers.scrutiny-rtr.rule=Host(`scrutiny.$DOMAINNAME_HS`)"
      # Middlewares
      - "traefik.http.routers.scrutiny-rtr.middlewares=chain-authelia@file"
      # HTTP Services
      - "traefik.http.routers.scrutiny-rtr.service=scrutiny-svc"
      - "traefik.http.services.scrutiny-svc.loadbalancer.server.port=8080"

  collector:
    container_name: scrutiny_collector
    image: 'ghcr.io/analogj/scrutiny:master-collector'
    cap_add:
      - SYS_RAWIO
      #- SYS_ADMIN #optional
    networks:
      t3_proxy:
    #    #ipv4_address: 192.168.5.120 # You can specify a static IP
      socket_proxy:
    volumes:
      - '/run/udev:/run/udev:ro'
     # - /dev/sda:/dev/sda               # Device mappings
     # - /dev/sdb:/dev/sdb
     # - /dev/sdc:/dev/sdc
     # - /dev/sdd:/dev/sdd
     # - /dev/sde:/dev/sde
     # - /dev/sdf:/dev/sdf
    environment:
      COLLECTOR_API_ENDPOINT: 'http://192.168.5.120:8080'
      COLLECTOR_HOST_ID: 'scrutiny'
    #depends_on:
    #  web:
    #    condition: service_healthy
    devices:
      - /dev/sda:/dev/sda
      - "/dev/sda"                # Device mappings
      - "/dev/sdb"
      - "/dev/sdc"
      - "/dev/sdd"
      - "/dev/sde"
      - "/dev/sdf"
     # - "/dev/md/0"
AnalogJ commented 4 months ago

what do the (web & collector) logs say?

eralumin commented 2 months ago

Hi same here, no disks displayed, smartctl see them when i run the command on the docker container.

enoch85 commented 2 months ago

I don't know if it's related, but for me I recently setup something similar.

networks:
  monitoring: # A common network for all monitoring services to communicate into
  notifications: # To Gotify or another Notification service

services:
  influxdb:
    container_name: influxdb
    image: influxdb:2.7-alpine
    ports:
      - 8086:8086
    volumes:
      - ${DIR_CONFIG}/influxdb2/db:/var/lib/influxdb2
      - ${DIR_CONFIG}/influxdb2/config:/etc/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=Admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD}
      - DOCKER_INFLUXDB_INIT_ORG=homelab
      - DOCKER_INFLUXDB_INIT_BUCKET=scrutiny
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=REDACTED
      - TZ=Europe/Stockholm
    restart: unless-stopped
    networks:
      - monitoring
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8086/health"]
      interval: 5s
      timeout: 10s
      retries: 20

  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-web
    ports:
      - 8080:8080
    volumes:
      - ${DIR_CONFIG}/config:/opt/scrutiny/config
    environment:
      - SCRUTINY_WEB_INFLUXDB_HOST=influxdb
      - SCRUTINY_WEB_INFLUXDB_PORT=8086
      - SCRUTINY_WEB_INFLUXDB_TOKEN=REDACTED
      - SCRUTINY_WEB_INFLUXDB_ORG=homelab
      - SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
      # Optional but highly recommended to notify you in case of a problem
      - SCRUTINY_NOTIFY_URLS=REDACTED
      - TZ=Europe/Stockholm
    depends_on:
      influxdb:
        condition: service_healthy
    restart: unless-stopped
    networks:
      - notifications
      - monitoring
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
      interval: 5s
      timeout: 10s
      retries: 20
      start_period: 10s

The Scrutiny container just dies randomly, which in turn makes it impossible for the collectors (in the same local network) to report anything to the Web. Health Check doesn't help either since it doesn't restart the container. I tried using Influx 2.1, 2.2 and now lately 2.7. No difference - and the DB is always stable it seems.

OS

Ubuntu 24.04

Docker version

(running rootless)

ii  docker-buildx-plugin                 0.16.2-1~ubuntu.24.04~noble       amd64        Docker Buildx cli plugin.
ii  docker-ce                            5:27.2.0-1~ubuntu.24.04~noble     amd64        Docker: the open-source application container engine
ii  docker-ce-cli                        5:27.2.0-1~ubuntu.24.04~noble     amd64        Docker CLI: the open-source application container engine
ii  docker-ce-rootless-extras            5:27.2.0-1~ubuntu.24.04~noble     amd64        Rootless support for Docker.
ii  docker-compose-plugin                2.29.2-1~ubuntu.24.04~noble       amd64        Docker Compose (V2) plugin for the Docker CLI.

Scrutiny verison

0.8.1 (master-web)

InfluxDB version

2.7 (also tested 2.1 and 2.2)

Folder strucutre

user@scrutiny:~/scrutiny$ ls -l
total 12
drwxr-xr-x 2 user user 4096 Sep  8 16:37 config
-rw-rw-r-- 1 user user 2361 Sep  8 10:36 docker-compose.yaml
drwxr-xr-x 4 user user 4096 Sep  7 22:58 influxdb2

EDIT 2 hours later

Now it seems to have been running fine. Will get back with some logs in case it happens again.

enoch85 commented 2 months ago

OK, so here are some logs:

Scrutiny

time="2024-09-08T18:45:32+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T18:45:32+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:18:45:32 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (2ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=2 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
2024/09/08 22:35:58 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.8.1

Start the scrutiny server
time="2024-09-08T22:35:58+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:58+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:58+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
panic: failed to check influxdb setup status - Get "http://influxdb:8086/api/v2/setup": dial tcp 172.19.0.2:8086: connect: connection refused

goroutine 1 [running]:
github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware.RepositoryMiddleware({0x11c62a8?, 0xc000014d90?}, {0x11ca9b0?, 0xc000407880?})
    /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:15 +0xd6
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Setup(0xc000013320, 0x1044847?)
    /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:26 +0xa5
github.com/analogj/scrutiny/webapp/backend/pkg/web.(*AppEngine).Start(0xc000013320)
    /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/server.go:82 +0x12c
main.main.func2(0xc000313f40)
    /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:133 +0x39c
github.com/urfave/cli/v2.(*Command).Run(0xc0003d5200, 0xc000313dc0)
    /go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/command.go:164 +0x5c8
github.com/urfave/cli/v2.(*App).RunContext(0xc0002d6600, {0x11bd6c8?, 0xc000046040}, {0xc000036040, 0x2, 0x2})
    /go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:306 +0xbac
github.com/urfave/cli/v2.(*App).Run(...)
    /go/src/github.com/analogj/scrutiny/vendor/github.com/urfave/cli/v2/app.go:215
main.main()
    /go/src/github.com/analogj/scrutiny/webapp/backend/cmd/scrutiny/scrutiny.go:158 +0x774

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.8.1

Start the scrutiny server
2024/09/08 22:35:59 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
time="2024-09-08T22:35:59+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Database migration starting. Please wait, this process may take a long time...." type=web
time="2024-09-08T22:35:59+02:00" level=info msg="Database migration completed successfully" type=web
time="2024-09-08T22:35:59+02:00" level=info msg="SQLite global configuration migrations starting. Please wait...." type=web
time="2024-09-08T22:35:59+02:00" level=info msg="SQLite global configuration migrations completed successfully" type=web
time="2024-09-08T22:36:04+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:04+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:04 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:10+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:10+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:10 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (4ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=4 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:15+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:15+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:15 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:20+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:20+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:20 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (2ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=2 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:25+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:25+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:25 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:30+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:30+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:30 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-09-08T22:36:35+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-09-08T22:36:35+02:00" level=info msg="127.0.0.1 - c51c2c16a23e [08/Sep/2024:22:36:35 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (3ms)" clientIP=127.0.0.1 hostname=c51c2c16a23e latency=3 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1

Influx DB

ts=2024-09-08T16:11:55.098457Z lvl=info msg="Pruning shard groups after retention check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=end op_elapsed=0.135ms
ts=2024-09-08T16:11:55.098522Z lvl=info msg="Retention policy deletion check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=end op_elapsed=0.502ms
ts=2024-09-08T16:20:13.098918Z lvl=info msg="Cache snapshot (start)" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
ts=2024-09-08T16:20:13.147114Z lvl=info msg="Snapshot for path written" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1 duration=48.299ms
ts=2024-09-08T16:20:13.147214Z lvl=info msg="Cache snapshot (end)" log_id=0rWJSyNW000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=48.402ms
ts=2024-09-08T16:41:55.099150Z lvl=info msg="Retention policy deletion check (start)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=start
ts=2024-09-08T16:41:55.099452Z lvl=info msg="Pruning shard groups after retention check (start)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=start
ts=2024-09-08T16:41:55.099550Z lvl=info msg="Pruning shard groups after retention check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_name=retention_prune_shard_groups op_event=end op_elapsed=0.114ms
ts=2024-09-08T16:41:55.099706Z lvl=info msg="Retention policy deletion check (end)" log_id=0rWJSyNW000 service=retention op_name=retention_delete_check op_event=end op_elapsed=0.693ms
2024-09-08T20:35:58.    info    found existing boltdb file, skipping setup wrapper  {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
2024-09-08T20:35:58.    info    found existing boltdb file, skipping setup wrapper  {"system": "docker", "bolt_path": "/var/lib/influxdb2/influxd.bolt"}
ts=2024-09-08T20:35:59.350614Z lvl=info msg="Welcome to InfluxDB" log_id=0rWqSWyG000 version=v2.7.10 commit=f302d9730c build_date=2024-08-16T20:19:28Z log_level=info
ts=2024-09-08T20:35:59.350859Z lvl=warn msg="nats-port argument is deprecated and unused" log_id=0rWqSWyG000
ts=2024-09-08T20:35:59.355792Z lvl=info msg="Resources opened" log_id=0rWqSWyG000 service=bolt path=/var/lib/influxdb2/influxd.bolt
ts=2024-09-08T20:35:59.356016Z lvl=info msg="Resources opened" log_id=0rWqSWyG000 service=sqlite path=/var/lib/influxdb2/influxd.sqlite
ts=2024-09-08T20:35:59.374536Z lvl=info msg="Checking InfluxDB metadata for prior version." log_id=0rWqSWyG000 bolt_path=/var/lib/influxdb2/influxd.bolt
ts=2024-09-08T20:35:59.375041Z lvl=info msg="Using data dir" log_id=0rWqSWyG000 service=storage-engine service=store path=/var/lib/influxdb2/engine/data
ts=2024-09-08T20:35:59.375103Z lvl=info msg="Compaction settings" log_id=0rWqSWyG000 service=storage-engine service=store max_concurrent_compactions=1 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648
ts=2024-09-08T20:35:59.375125Z lvl=info msg="Open store (start)" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open op_event=start
ts=2024-09-08T20:35:59.433280Z lvl=info msg="index opened with 8 partitions" log_id=0rWqSWyG000 service=storage-engine index=tsi
ts=2024-09-08T20:35:59.439619Z lvl=info msg="loading changes (start)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=start
ts=2024-09-08T20:35:59.439798Z lvl=info msg="loading changes (end)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=end op_elapsed=0.166ms
ts=2024-09-08T20:35:59.440515Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/d2cb8fefbb0e6d58/autogen/2/000000001-000000001.tsm id=0 duration=0.283ms
ts=2024-09-08T20:35:59.440864Z lvl=info msg="Opened shard" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/d2cb8fefbb0e6d58/autogen/2 duration=58.126ms
ts=2024-09-08T20:35:59.466214Z lvl=info msg="index opened with 8 partitions" log_id=0rWqSWyG000 service=storage-engine index=tsi
ts=2024-09-08T20:35:59.469318Z lvl=info msg="loading changes (start)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=start
ts=2024-09-08T20:35:59.469464Z lvl=info msg="loading changes (end)" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 op_name="field indices" op_event=end op_elapsed=0.205ms
ts=2024-09-08T20:35:59.472423Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000005-000000002.tsm id=0 duration=1.503ms
ts=2024-09-08T20:35:59.477915Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000014-000000002.tsm id=1 duration=4.616ms
ts=2024-09-08T20:35:59.478595Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000017-000000001.tsm id=4 duration=0.372ms
ts=2024-09-08T20:35:59.480224Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000018-000000001.tsm id=5 duration=1.487ms
ts=2024-09-08T20:35:59.480700Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000015-000000001.tsm id=2 duration=0.348ms
ts=2024-09-08T20:35:59.476600Z lvl=info msg="Opened file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1/000000016-000000001.tsm id=3 duration=1.608ms
ts=2024-09-08T20:35:59.481277Z lvl=info msg="Reading file" log_id=0rWqSWyG000 service=storage-engine engine=tsm1 service=cacheloader path=/var/lib/influxdb2/engine/wal/fd7bf29911cfbc2a/autogen/1/_00013.wal size=14266
ts=2024-09-08T20:35:59.487029Z lvl=info msg="Opened shard" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/fd7bf29911cfbc2a/autogen/1 duration=88.776ms
ts=2024-09-08T20:35:59.487738Z lvl=info msg="Open store (end)" log_id=0rWqSWyG000 service=storage-engine service=store op_name=tsdb_open op_event=end op_elapsed=112.614ms
ts=2024-09-08T20:35:59.489881Z lvl=info msg="Starting retention policy enforcement service" log_id=0rWqSWyG000 service=retention check_interval=30m
ts=2024-09-08T20:35:59.490018Z lvl=info msg="Starting precreation service" log_id=0rWqSWyG000 service=shard-precreation check_interval=10m advance_period=30m
ts=2024-09-08T20:35:59.496753Z lvl=info msg="Starting query controller" log_id=0rWqSWyG000 service=storage-reads concurrency_quota=1024 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=1024
ts=2024-09-08T20:35:59.614105Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0rWqSWyG000 max_select_point=0 max_select_series=0 max_select_buckets=0
ts=2024-09-08T20:35:59.630557Z lvl=info msg=Starting log_id=0rWqSWyG000 service=telemetry interval=8h
ts=2024-09-08T20:35:59.633322Z lvl=info msg=Listening log_id=0rWqSWyG000 service=tcp-listener transport=http addr=:8086 port=8086

This happens when I try to connect to the web after being disconnected for a while. @AnalogJ do you need anything else to debug? Made a donation through PayPal)

enoch85 commented 2 months ago

OK, so I think I found it! https://forums.docker.com/t/linux-rootless-docker-only-starting-after-user-login/141505/2

It's because I run rootless docker.

oc013 commented 2 months ago

I'm also running rootless docker and came up with a work around assuming we trust this program to run as root on the host. Interested in hearing about improvements to this method to lock it down as much as possible (different user with just smartctl and http access?).

Set up Hub/Spoke deployment with just influx and the web UI. Expose the direct web UI port only to localhost.

services:
  influxdb:
    image: influxdb:2.2
    volumes:
      - ./influxdb:/var/lib/influxdb2
      - ./influxconfig:/etc/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=Admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=${PASSWORD}
      - DOCKER_INFLUXDB_INIT_ORG=homelab
      - DOCKER_INFLUXDB_INIT_BUCKET=scrutiny
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=${TOKEN}
    restart: unless-stopped
    networks:
      - scrutiny
  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-web
    ports:
      - 127.0.0.1:8901:8080
    environment:
      - SCRUTINY_WEB_INFLUXDB_HOST=influxdb
      - SCRUTINY_WEB_INFLUXDB_PORT=8086
      - SCRUTINY_WEB_INFLUXDB_TOKEN=${TOKEN}
      - SCRUTINY_WEB_INFLUXDB_ORG=homelab
      - SCRUTINY_WEB_INFLUXDB_BUCKET=scrutiny
    depends_on:
      - influxdb
    volumes:
      - ./config:/opt/scrutiny/config
    restart: unless-stopped
    networks:
      - scrutiny
      - npm-network
networks:
  scrutiny: null
  npm-network:
    external: true

Install collector on host system as described here: https://github.com/AnalogJ/scrutiny/blob/master/docs/INSTALL_MANUAL.md#collector

Set up the cronjob as root:

*/15 * * * * . /etc/profile; /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --api-endpoint "http://localhost:8901"