AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
4.72k stars 154 forks source link

[BUG] collector ignores some setting in config file on macOS #565

Open phirestalker opened 5 months ago

phirestalker commented 5 months ago

I have the server portion running in docker on my linux machine. The problem system is an M1 mac

Describe the bug I have set the host id and some device settings. When the collector is run, it will see the api and command settings, but will not use the host id. Also, it is getting stats for a device that I have set to ignore. There is only one real device (other than USB). smartctl is picking up the container as a drive, which is why I was trying to ignore a device.

Expected behavior I expected the host id and device settings to be honored like they are on other platforms. I also expected it to show me stats for disk0 since I set that up in the config file.

Screenshots

Screenshot 2024-01-12 at 4 02 00 PM

Config file for collector

# Commented Scrutiny Configuration File
#
# The default location for this file is /opt/scrutiny/config/collector.yaml.
# In some cases to improve clarity default values are specified,
# uncommented. Other example values are commented out.
#
# When this file is parsed by Scrutiny, all configuration file keys are
# lowercased automatically. As such, Configuration keys are case-insensitive,
# and should be lowercase in this file to be consistent with usage.

######################################################################
# Version
#
# version specifies the version of this configuration file schema, not
# the scrutiny binary. There is only 1 version available at the moment
version: 1

# The host id is a label used for identifying groups of disks running on the same host
# Primiarly used for hub/spoke deployments (can be left empty if using all-in-one image).
host:
  id: "Neal's Mac Mini"

# This block allows you to override/customize the settings for devices detected by
# Scrutiny via `smartctl --scan`
# See the "--device=TYPE" section of https://linux.die.net/man/8/smartctl
# type can be a 'string' or a 'list'
devices:
  - device: /dev/disk0
    type: 'auto'

  - device: /dev/disk3
    ignore: true

#  # example for forcing device type detection for a single disk
#  - device: /dev/sda
#    type: 'sat'
#
#  # example for using `-d sat,auto`, notice the square brackets (workaround for #418)
#  - device: /dev/sda
#    type: ['sat,auto']
#
#  # example to show how to ignore a specific disk/device.
#  - device: /dev/sda
#    ignore: true
#
#  # examples showing how to force smartctl to detect disks inside a raid array/virtual disk
#  - device: /dev/bus/0
#    type:
#      - megaraid,14
#      - megaraid,15
#      - megaraid,18
#      - megaraid,19
#      - megaraid,20
#      - megaraid,21
#
#  - device: /dev/twa0
#    type:
#      - 3ware,0
#      - 3ware,1
#      - 3ware,2
#      - 3ware,3
#      - 3ware,4
#      - 3ware,5
#
#  # example to show how to override the smartctl command args (per device), see below for how to override these globally.
#  - device: /dev/sda
#    commands:
#      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
#      metrics_smart_args: '--xall --json -T permissive' # used to retrieve smart data for each device.

#log:
#  file: '' #absolute or relative paths allowed, eg. web.log
#  level: INFO
#
api:
  endpoint: 'http://192.168.133.138:8080'
#  endpoint: 'http://localhost:8080/custombasepath'
# if you need to use a custom base path (for a reverse proxy), you can add a suffix to the endpoint.
#  See docs/TROUBLESHOOTING_REVERSE_PROXY.md for more info,

# example to show how to override the smartctl command args globally
commands:
  metrics_smartctl_bin: '/opt/homebrew/bin/smartctl' # change to provide custom `smartctl` binary path, eg. `/usr/sbin/smartctl`
#  metrics_scan_args: '--scan --json' # used to detect devices
#  metrics_info_args: '--info --json' # used to determine device unique ID & register device with Scrutiny
#  metrics_smart_args: '--xall --json' # used to retrieve smart data for each device.

########################################################################################################################
# FEATURES COMING SOON
#
# The following commented out sections are a preview of additional configuration options that will be available soon.
#
########################################################################################################################

#collect:
#  long:
#    enable: false
#    command: ''
#  short:
#    enable: false
#    command: ''

Log Files I tried to modify my setup to get the logs, but I couldn't get it to work. Using the plain docker image and running it like the instructions say would not diagnose my issue since it is only one host with this problem. If there is a way to set debug mode on just the collector, or through my compose file for the server, I will be happy to try that.

AnalogJ commented 5 months ago

strange, the host.id is set in a common file: https://github.com/AnalogJ/scrutiny/blob/240178d742a5fe84b5b61952897a855f9425b790/collector/pkg/detect/detect.go#L126

It's executed the same way irrespective of the operating system or architecture. The only thing I can think of is that the ' quote is confusing it somehow? Can you try with a simpler host name?

phirestalker commented 5 months ago

I tried with just Mac Mini and it still didn't use it. I have a collector running on another mac (Intel) that works fine with an apostrophe.

AnalogJ commented 5 months ago

This might be a stupid question, but are you 100% sure it's reading the config file? It's optional and the collector will happily run without it.

phirestalker commented 5 months ago

Not stupid. I thought that too at first, but it is hitting the API that is configured later in that config file. I am trying it now with single quotes like the rest of the file has to see what happens.

EDIT: Ugh, still not working

AnalogJ commented 5 months ago

Hmm, so yaml is also white space sensitive. Any chance there's a space before "host", or the id key is misaligned?

phirestalker commented 5 months ago

no spaces before host and there are 2 spaces before id. What is the best way to share it so it preserves any formatting mistakes in my file? Do I need to share the file or will copy/paste work?

AnalogJ commented 5 months ago

Doh, I just remembered that debug mode should print the config options.

Just add a "--debug" flag when you run the collector

phirestalker commented 5 months ago

here is the top part that shows the config settings loaded.

2024/01/23 14:41:20 Loading configuration file: /opt/scrutiny/config/collector.yaml
DEBU[0000] {
    "api": {
        "endpoint": "http://192.168.133.138:8080"
    },
    "commands": {
        "metrics_info_args": "--info --json",
        "metrics_scan_args": "--scan --json",
        "metrics_smart_args": "--xall --json",
        "metrics_smartctl_bin": "/opt/homebrew/bin/smartctl"
    },
    "devices": [
        {
            "device": "/dev/disk0",
            "type": "auto"
        },
        {
            "device": "/dev/disk3",
            "ignore": true
        }
    ],
    "host": {
        "id": "My Mac Mini"
    },
    "log": {
        "file": "",
        "level": "DEBUG"
    },
    "version": 1
}<nil>  type=metrics

EDIT: forgot to mention before. I have 3 other systems that also use the same scrutiny web endpoint (In case there is some limit)

AnalogJ commented 5 months ago

this is super strange. That all looks correct, and the host_id should be updated on the scrutiny side on every run

Are you seeing your devices in the Web UI? is the collection date being updated? Last Updated on January 22, 2024 - 16:00 etc?

AnalogJ commented 5 months ago

forgot to mention before. I have 3 other systems that also use the same scrutiny web endpoint (In case there is some limit)

nope, no limit like that.

phirestalker commented 5 months ago

Are you seeing your devices in the Web UI? is the collection date being updated? Last Updated on January 22, 2024 - 16:00 etc?

Yep, on both counts. It just shows the one device (the wrong one) with no host label. It is the one on top of the screenshot attached to the first post.

AnalogJ commented 5 months ago

Alright, lets try to confirm this isn't a bug in the frontend code. Open up the Scrutiny webui and replace the url path with

https://SCRUTINY_HOSTNAME_OR_IP/api/summary

Then look though the json until you find your disk. Then look for the host_id key for the disk.

phirestalker commented 5 months ago

That one seems to just constantly load. It is capitalizing API after I hit enter.

AnalogJ commented 5 months ago

wait, thats not loading?

From your config file, it should be: http://192.168.133.138:8080/api/summary

phirestalker commented 5 months ago

D'oh! I usually access the web interface through my reverse proxy.

{"device":{"CreatedAt":"2024-01-23T15:15:25.316072847-07:00","UpdatedAt":"2024-01-23T16:00:26.955538073-07:00","DeletedAt":null,"wwn":"0ba0122c40b0ba26","device_name":"disk3","device_uuid":"","device_serial_id":"","device_label":"","manufacturer":"","model_name":"APPLE SSD AP1024Q","interface_type":"","interface_speed":"","serial_number":"0ba0122c40b0ba26","firmware":"359.60.3","rotational_speed":0,"capacity":0,"form_factor":"","smart_support":false,"device_protocol":"NVMe","device_type":"","label":"","host_id":"","device_status":0},"smart":{"collector_date":"2024-01-23T23:00:26Z","temp":34,"power_on_hours":5802},"temp_history":[{"date":"2024-01-23T23:00:00Z","temp":33},{"date":"2024-01-23T23:10:46.356704575Z","temp":34}]}
AnalogJ commented 5 months ago

ok well, it seems like there's an issue in the scrutiny web backend then, for some reason it's not populating the host_id that the collector is sending...

You could run the scrutiny app with --debug or add a DEBUG=true environmental variable to the container to see the debug logs and confirm the collector is sending it. Might also give us an idea of why the data isn't being stored

phirestalker commented 5 months ago

I added debug to the environment and then ran docker logs to try to see it, but it looks like it's not the full debug logs

phirestalker commented 5 months ago

If I put DEBUG=true in the environment, shouldn't the extra logs show up in docker logs? It seems to be minimum request info, no JSON or anything. Even the ones that work do not show the host id in the log. I will try to spin up a container on the same machine with debug to try to get some better logs. Also, I tried specifying host id on the command line with no config file. It still doesn't get through.

vercas commented 3 months ago

I am also encountering this issue.

stoneobscurity commented 2 months ago

i'm getting this same issue on darwin.arm64-0.8.1

stoneobscurity commented 2 months ago

i did the /api/summary

looks like the host_id comes in as blank on that drive.

{"data":{"summary":{"0ba0160b82380828":{"device":{"CreatedAt":"2024-04-28T19:21:16.878872528-05:00","UpdatedAt":"2024-04-29T03:10:23.266775328-05:00","DeletedAt":null,"wwn":"0ba0160b82380828","device_name":"disk3","device_uuid":"","device_serial_id":"","device_label":"","manufacturer":"","model_name":"APPLE SSD AP0512Q","interface_type":"","interface_speed":"","serial_number":"0ba0160b82380828","firmware":"373.100.","rotational_speed":0,"capacity":0,"form_factor":"","smart_support":false,"device_protocol":"NVMe","device_type":"nvme","label":"","host_id":"","device_status":0},"smart":{"collector_date":"2024-04-29T08:10:23Z","temp":32,"power_on_hours":2475},"temp_history":[{"date":"2024-04-29T01:00:00Z","temp":39},{"date":"2024-04-29T04:00:00Z","temp":37}, ...

works fine for all the other drives. and works on darwin-amd64, the arm-64 seems to be the only one having issues.