AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.01k stars 165 forks source link

[BUG] smartctl returning errors 64, 4 and 36 for 4x of my 27 drives #316

Closed iamwoz closed 2 years ago

iamwoz commented 2 years ago

Describe the bug Getting error messages for a subset of my drives visible to Scrutiny.

I have removed 4x drives and replaced with 3x new drives in the last week (so /dev assignments have likely changed) - I'm unsure if I need to inform Scrutiny of this fact or how.

time="2022-06-29T10:25:30+12:00" level=error msg="smartctl returned an error code (64) while processing sdu\n" type=metrics time="2022-06-29T10:25:34+12:00" level=error msg="smartctl returned an error code (4) while processing sdl\n" type=metrics time="2022-06-29T10:25:37+12:00" level=error msg="smartctl returned an error code (36) while processing sda\n" type=metrics time="2022-06-29T10:25:39+12:00" level=error msg="smartctl returned an error code (4) while processing sdf\n" type=metrics

Expected behavior These should be detected as per others.

Collector log attached collector.log

Docker compose file version: '3.3' services: scrutiny: container_name: scrutiny image: ghcr.io/analogj/scrutiny:beta-omnibus privileged: true

cap_add:

    #    - SYS_RAWIO
    #    - SYS_ADMIN
    ports:
        - 8484:8080
        - 8086:8086
    environment:
        - PUID=99
        - PGID=100
        - TZ=Pacific/Auckland
        - DEBUG=true
        - COLLECTOR_LOG_FILE=/opt/scrutiny/logs/collector.log
    volumes:
        - /run/udev:/run/udev:ro
        - /mnt/app/appdata/scrutiny/config:/opt/scrutiny/config
        - /mnt/app/appdata/scrutiny/logs:/opt/scrutiny/logs
        - /mnt/app/appdata/scrutiny/influxdb:/opt/scrutiny/influxdb
    devices:
        - /dev/nvme0
        - /dev/nvme1
        - /dev/nvme2
        - /dev/sda
        - /dev/sdb          
        - /dev/sdc
        - /dev/sdd
        - /dev/sde
        - /dev/sdf
        - /dev/sdg
        - /dev/sdh
        - /dev/sdi
        - /dev/sdj
        - /dev/sdk
        - /dev/sdl
        - /dev/sdm
        - /dev/sdn
        - /dev/sdo
        - /dev/sdp
        - /dev/sdq
        - /dev/sdr
        - /dev/sds
        - /dev/sdt
        - /dev/sdu
        - /dev/sdv
        - /dev/sdw
        - /dev/sdx
        - /dev/sdy
        - /dev/sdz
    restart: unless-stopped

OTHER INFO Scrutiny is currently reporting FAILED status for two drives (one of which: sdu is included in the smartctl errors but don't believe this is related).

image

image

image

AnalogJ commented 2 years ago

please look at https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#exit-codes

In general, collector issues are due to failures in the smartctl command. Sometimes its related to permissions, other times its related to missing CLI flags.

See if you can get smartctl working successfully with your device first, then you can update the scrutiny config with the relevant parameters.