AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.37k stars 171 forks source link

[BUG] runtime error: invalid memory address or nil pointer dereference #591

Open ovizii opened 8 months ago

ovizii commented 8 months ago

Describe the bug Trying to access the omnibus Web GUI results in a perpetual loading state.

Additional Info This instance of scrutiny has been working for more than a few years already. Prior to this error I migrated the existing physical machine to new hardware by transplanting the OS disks as well as all other disks and peripherals to a new case, mainboard and CPU without reinstalling. 99% of the system has kept working, so I am unsure what I could have changed to trigger this error with scrutiny.

Btw. there was a similar issue which has disappeared: https://github.com/AnalogJ/scrutiny/issues/523

Expected behaviour I was expecting to access the GUI (as usual)

Screenshots image

Log Files

scrutiny  | time="2024-02-27T17:54:23+01:00" level=info msg="10.10.10.1 - scrutiny [27/Feb/2024:17:54:23 +0100] \"GET /web/android-icon-192x192.png\" 200 7467 \"https://smart.domain.tld/web/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36\" (1ms)" clientIP=10.10.10.1 hostname=scrutiny latency=1 method=GET path=/web/android-icon-192x192.png referer="https://smart.domain.tld/web/" respLength=7467 statusCode=200 type=web userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36"
scrutiny  |
scrutiny  |
scrutiny  | 2024/02/27 17:54:23 [Recovery] 2024/02/27 - 17:54:23 panic recovered:
scrutiny  | runtime error: invalid memory address or nil pointer dereference
scrutiny  | /usr/local/go/src/runtime/panic.go:260 (0x44d01c)
scrutiny  | /usr/local/go/src/runtime/signal_unix.go:841 (0x44cfec)
scrutiny  | /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository.go:436 (0xc615f4)
scrutiny  | /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/handler/get_devices_summary.go:14 (0xe0be88)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xdcd09a)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/recovery.go:83 (0xdcd086)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe0fc1e)
scrutiny  | /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/config.go:11 (0xe0fc05)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe1117e)
scrutiny  | /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:29 (0xe11165)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe10173)
scrutiny  | /go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/logger.go:56 (0xe1014e)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xdcbfe9)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/gin.go:409 (0xdcbc37)
scrutiny  | /go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/gin.go:367 (0xdcb753)
scrutiny  | /usr/local/go/src/net/http/server.go:2936 (0x7961f5)
scrutiny  | /usr/local/go/src/net/http/server.go:1995 (0x791711)
scrutiny  | /usr/local/go/src/runtime/asm_amd64.s:1598 (0x469280)
scrutiny  |
scrutiny  | time="2024-02-27T17:54:23+01:00" level=error msg="10.10.10.1 - scrutiny [27/Feb/2024:17:54:23 +0100] \"GET /api/summary\" 500 0 \"https://smart.domain.tld/web/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36\" (50ms)" clientIP=10.10.10.1 hostname=scrutiny latency=50 method=GET path=/api/summary referer="https://smartdomain.tld/web/" respLength=0 statusCode=500 type=web userAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36"

Please also provide the output of docker info

Client: Docker Engine - Community
 Version:    25.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 36
  Running: 36
  Paused: 0
  Stopped: 0
 Images: 32
 Server Version: 25.0.3
 Storage Driver: zfs
  Zpool: rpool
  Zpool Health: ONLINE
  Parent Dataset: rpool/docker
  Space Used By Parent: 90206208
  Space Available: 761865920512
  Parent Quota: no
  Compression: on
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 6.5.11-8-pve
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 62.53GiB
 Name: nas
 ID: a9b276ef-51a5-47fe-b188-38ff9edb9e79
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: ovizii
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.16.0.0/12, Size: 28

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Here is my docker-compose.yml:

services:
  scrutiny:
    image: ghcr.io/analogj/scrutiny:master-omnibus
    container_name: scrutiny
    hostname: scrutiny
    cap_add:
      - SYS_RAWIO
      - SYS_ADMIN
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Berlin
      - COLLECTOR_API_ENDPOINT=http://localhost:8080
    volumes:
      - ./config:/opt/scrutiny/config
      - ./influxdb:/opt/scrutiny/influxdb
      - /run/udev:/run/udev:ro
    devices:
      - /dev/sda:/dev/sda
      - /dev/sdb:/dev/sdb
      - /dev/sdc:/dev/sdc
      - /dev/sdd:/dev/sdd
      - /dev/sde:/dev/sde
      - /dev/nvme0:/dev/nvme0
      - /dev/nvme1:/dev/nvme1
    cpus: 1
    mem_limit: 1G
    restart: "no"
    networks:
      scrutiny:
        ipv4_address: 192.168.192.86
      traefik_scrutiny:
#    mac_address: 02:42:c0:a8:c0:56
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=traefik_scrutiny"
      - "traefik.http.routers.scrutiny.tls=true"
      - "traefik.http.routers.scrutiny.entrypoints=websecure"
      - "traefik.http.routers.scrutiny.rule=Host(`smart.domain.tld`)"
      - "traefik.http.routers.scrutiny.middlewares=secHeaders@file,localIPsOnly@file,authentik@docker"
      - "traefik.http.routers.scrutiny.service=scrutiny"
      - "traefik.http.services.scrutiny.loadbalancer.server.port=8080"

networks:

  scrutiny:
    name: scrutiny
    driver: macvlan
    ipam:
      config:
        - subnet: 192.168.192.84/30
          gateway: 192.168.192.85
    driver_opts:
      parent: vmbr1.1019
    external: false

  traefik_scrutiny:
    external: true
    internal: true
    name: traefik_scrutiny
ovizii commented 8 months ago

I did some digging and can add some more info about this incident:

As I mentioned, I moved disks between PCs so when I last was able to access the scrutiny interface, I saw some disks listed multiple times under different names i.e. /dev/sda, sdb, etc. I then tried cleaning up each disk except the one with the last report date, and after I clicked delete device and eventually restarted the scrutiny container, the problem occurred.

BTW. I restored my scrutiny DB from a backup and the GUI is back up and running again.

Please let me know how to safely delete duplicate disks safely.

jgwehr commented 7 months ago

Having a similar issue after upgrading to v0.8.1.

I had just removed a disk but it had continued to show up with null / 0 values for everything, prior to the upgrade. Given the OP went through something similar, and the stack trace below, I get the sense the database is corrupted in some way.

Expected behaviour

  1. The web interface should load, instead of spinning endlessly.
  2. Summary APIs should return valid content instead of HTTP error.

Debug / Logs

Chrome console:

ERROR Error: Uncaught (in promise): Ut: {"headers":{"normalizedNames":{},"lazyUpdate":null},"status":500,"statusText":"Internal Server Error","url":"http://192.168.1.203:9990/api/summary","ok":false,"name":"HttpErrorResponse","message":"Http failure response for http://192.168.1.203:9990/api/summary: 500 Internal Server Error","error":null}

Docker Logs

[Recovery] 2024/04/09 - 15:35:33 panic recovered:
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:260 (0x44cffc)
/usr/local/go/src/runtime/signal_unix.go:841 (0x44cfcc)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/database/scrutiny_repository.go:436 (0xc615d4)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/handler/get_devices_summary.go:14 (0xe0be68)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xdcd07a)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/recovery.go:83 (0xdcd066)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe0fbfe)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/config.go:11 (0xe0fbe5)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe1115e)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/repository.go:29 (0xe11145)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xe10153)
/go/src/github.com/analogj/scrutiny/webapp/backend/pkg/web/middleware/logger.go:56 (0xe1012e)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/context.go:161 (0xdcbfc9)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/gin.go:409 (0xdcbc17)
/go/src/github.com/analogj/scrutiny/vendor/github.com/gin-gonic/gin/gin.go:367 (0xdcb733)
/usr/local/go/src/net/http/server.go:2936 (0x7961d5)
/usr/local/go/src/net/http/server.go:1995 (0x7916f1)
/usr/local/go/src/runtime/asm_amd64.s:1598 (0x469260)

clientIP=192.168.96.1   hostname=0abfb89a7def   latency=30  level=error method=GET  msg=192.168.96.1 - 0abfb89a7def [09/Apr/2024:15:35:33 +0000] "GET /api/summary" 500 0 "" "" (30ms)  path=/api/summary   referer=resp    Length=0    statusCode=500  time=2024-04-09T15:35:33Z   type=webuser    Agent=

docker-compose

https://github.com/jgwehr/homelab-docker/blob/main/services/monitor/docker-compose.yml


  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-omnibus

    ports:
      - ${PORT_SCRUTINY}:8080 # webapp
      - ${PORT_SCRUTINY_DB}:8086 # influxDB admin

    labels:
      - diun.enable=true
      - homepage.group=System
      - homepage.name=Scrutiny
      - homepage.icon=scrutiny
      - homepage.href=http://${SERVER_URL}:${PORT_SCRUTINY}
      - homepage.description=Harddrive Health Monitoring
      - homepage.widget.type=scrutiny
      - homepage.widget.url=http://${SERVER_URL}:${PORT_SCRUTINY}

    devices:
      - /dev/sda
      - /dev/sdb
      - /dev/sdc

    volumes:
      - /run/udev:/run/udev:ro
      - ${CONFIGDIR}/scrutiny:/opt/scrutiny/config
      - ${DBDIR}/scrutiny:/opt/scrutiny/influxdb

    cap_add:
      - SYS_RAWIO #necessary to allow smartctl permission to query your device SMART data 
      - SYS_ADMIN #necessary for NVMe drives

    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 256M
    restart: unless-stopped

Docker Info

Client: Docker Engine - Community
 Version:    26.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.25.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 38
  Running: 37
  Paused: 0
  Stopped: 1
 Images: 48
 Server Version: 26.0.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.0-26-generic
 Operating System: Ubuntu 22.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.15GiB
 Name: redacted
 ID: redacted
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false