AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.3k stars 170 forks source link

STATUS Failed #310

Closed derekcentrico closed 2 years ago

derekcentrico commented 2 years ago

Describe the bug I am running the latest omnibus available (top right says master#e364e48). The release notes make me think the issue where all of my Seagate drives are FAILED should be fixed. They didn't have this issues on LinuxServer variants.

Expected behavior Properly read the data and not be false negatives for all Seagate drives as failed.

Screenshots https://postimg.cc/G9JHpCCD

Log Files Not sure where to run this DEBUG thing.

  scrutinyanalogj:
    ports:
      - '86:80'
      - '8886:8080'
    volumes:
      - '/home/docker/scrutinyanalogj:/opt/scrutiny/config'
      - '/home/docker/influxdb2:/opt/scrutiny/influxdb'
      - '/run/udev:/run/udev:ro'
    restart: always
    logging:
      options:
        max-size: 1g
    container_name: scrutinyanalogj
    environment:
      - PUID=1000
      - PGID=996
      - TZ=America/New_York
    devices:
      - '/dev/nvme0n1p1:/dev/nvme0n1p1'
      - '/dev/nvme1n1p1:/dev/nvme1n1p1'
      - '/dev/sde:/dev/sde'
      - '/dev/sdd:/dev/sdd'
      - '/dev/sdc:/dev/sdc'
      - '/dev/sdg:/dev/sdg'
      - '/dev/sdb:/dev/sdb'
      - '/dev/sdf:/dev/sdf'
      - '/dev/sda:/dev/sda'
    cap_add:
      - SYS_ADMIN
      - SYS_RAWIO
    image: ghcr.io/analogj/scrutiny:master-omnibus
    networks:
      vpnsys_net:
        ipv4_address: '172.22.0.109'

in another terminal trigger the collector

2022/06/21 13:05:02 No configuration file found at /opt/scrutiny/config/collector.yaml. Using Defaults.

 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
AnalogJ/scrutiny/metrics                               dev-0.4.13

time="2022-06-21T13:05:02-04:00" level=info msg="Verifying required tools" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --scan --json" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sdg" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sda" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sdb" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sdc" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sdd" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sde" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --info --json /dev/sdf" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Generating WWN" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Collecting smartctl results for sdg\n" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sdg" type=metrics
time="2022-06-21T13:05:02-04:00" level=error msg="smartctl returned an error code (4) while processing sdg\n" type=metrics
time="2022-06-21T13:05:02-04:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Publishing smartctl results for 0x5000c500c75a912d\n" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2022-06-21T13:05:02-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sda" type=metrics
time="2022-06-21T13:05:03-04:00" level=error msg="smartctl returned an error code (32) while processing sda\n" type=metrics
time="2022-06-21T13:05:03-04:00" level=error msg="smartctl detected a disk close to failure" type=metrics
time="2022-06-21T13:05:03-04:00" level=info msg="Publishing smartctl results for 0x5000c500b2750d1c\n" type=metrics
time="2022-06-21T13:05:03-04:00" level=info msg="Collecting smartctl results for sdb\n" type=metrics
time="2022-06-21T13:05:03-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sdb" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Publishing smartctl results for 0x5000c500b1c427ab\n" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Collecting smartctl results for sdc\n" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sdc" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Publishing smartctl results for 0x5000c500b27fd75a\n" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Collecting smartctl results for sdd\n" type=metrics
time="2022-06-21T13:05:14-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sdd" type=metrics
time="2022-06-21T13:05:15-04:00" level=error msg="smartctl returned an error code (4) while processing sdd\n" type=metrics
time="2022-06-21T13:05:15-04:00" level=error msg="smartctl detected a checksum error" type=metrics
time="2022-06-21T13:05:15-04:00" level=info msg="Publishing smartctl results for 0x50014ee2638f6a21\n" type=metrics
time="2022-06-21T13:05:15-04:00" level=info msg="Collecting smartctl results for sde\n" type=metrics
time="2022-06-21T13:05:15-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sde" type=metrics
time="2022-06-21T13:05:32-04:00" level=error msg="smartctl returned an error code (32) while processing sde\n" type=metrics
time="2022-06-21T13:05:32-04:00" level=error msg="smartctl detected a disk close to failure" type=metrics
time="2022-06-21T13:05:32-04:00" level=info msg="Publishing smartctl results for 0x5000c500938ab940\n" type=metrics
time="2022-06-21T13:05:32-04:00" level=info msg="Collecting smartctl results for sdf\n" type=metrics
time="2022-06-21T13:05:32-04:00" level=info msg="Executing command: smartctl --xall --json /dev/sdf" type=metrics
time="2022-06-21T13:05:43-04:00" level=info msg="Publishing smartctl results for 0x5000c500b1c48619\n" type=metrics
time="2022-06-21T13:05:43-04:00" level=info msg="Main: Completed" type=metrics

The log files will be available on your host in the config directory. Please attach them to this issue.

Please also provide the output of docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 25
  Running: 25
  Paused: 0
  Stopped: 0
 Images: 196
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.13.0-41-generic
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.64GiB
 Name: homeserver
 ID: PPXQ:MWXG:F62C:XE5P:2WKH:I3CQ:GCQG:UH3O:OO5X:KZN5:A4CF:HV7F
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
AnalogJ commented 2 years ago

please take a look at this doc: https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#seagate-drives-failing

you can globally reset your drive statuses by running the following steps:

https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#device-failed-but-smart--scrutiny-passed

derekcentrico commented 2 years ago

Ah, I thought it was a bug that would've been repaired in the update. Ran those commands and currently all show green.

AnalogJ commented 2 years ago

Unfortunately its an issue with data in the database, and I cant fix that without completely blowing away the device status for everyone.

Closing this issue as we have a workaround :)