alexbelgium / hassio-addons

My homeassistant addons
MIT License
1.53k stars 216 forks source link

🐛 [Scrutiny] FAILED: SMART #880

Closed dm82m closed 1 year ago

dm82m commented 1 year ago

Description

Using a Intel NUC with M.2 SSD and Scrunity-FA with protection mode disabled. Can see the device /dev/sda but only get "SMART: FAILED". Even if I put device type from auto to sat.

image

Reproduction steps

written above

Addon Logs

-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums
-----------------------------------------------------------
 Provided by: https://github.com/alexbelgium/hassio-addons 
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-configuration.sh: executing... 
Updating folders structure
[19:36:20] INFO: Hourly updates
[cont-init.d] 01-configuration.sh: exited 0.
[cont-init.d] 01-custom_script.sh: executing... 
[19:36:20] INFO: Execute /config/addons_autoscripts/scrutiny-fa.sh if existing
[19:36:20] INFO: ... no script found
[cont-init.d] 01-custom_script.sh: exited 0.
[cont-init.d] 01-timezone: executing... 
[cont-init.d] 01-timezone: exited 0.
[cont-init.d] 32-nginx_ingress.sh: executing... 
[cont-init.d] 32-nginx_ingress.sh: exited 0.
[cont-init.d] 50-cron-config: executing... 
[cont-init.d] 50-cron-config: exited 0.
[cont-init.d] 90-run.sh: executing... 
[cont-init.d] 90-run.sh: exited 0.
[cont-init.d] done.
[services.d] starting services
waiting for influxdb
starting cron
waiting for scrutiny service to start
influxdb config file already exists. skipping.
starting influxdb
influxdb not ready
[services.d] done.
scrutiny api not ready
ts=2023-06-26T17:36:21.626359Z lvl=info msg="Welcome to InfluxDB" log_id=0ifCICUW000 version=v2.2.0 commit=a2f8538837 build_date=2022-04-06T17:36:40Z
ts=2023-06-26T17:36:21.628406Z lvl=info msg="Resources opened" log_id=0ifCICUW000 service=bolt path=/opt/scrutiny/influxdb/influxd.bolt
ts=2023-06-26T17:36:21.628499Z lvl=info msg="Resources opened" log_id=0ifCICUW000 service=sqlite path=/opt/scrutiny/influxdb/influxd.sqlite
ts=2023-06-26T17:36:21.634606Z lvl=info msg="Checking InfluxDB metadata for prior version." log_id=0ifCICUW000 bolt_path=/opt/scrutiny/influxdb/influxd.bolt
ts=2023-06-26T17:36:21.634820Z lvl=info msg="Using data dir" log_id=0ifCICUW000 service=storage-engine service=store path=/opt/scrutiny/influxdb/engine/data
ts=2023-06-26T17:36:21.634856Z lvl=info msg="Compaction settings" log_id=0ifCICUW000 service=storage-engine service=store max_concurrent_compactions=2 throughput_bytes_per_second=50331648 throughput_bytes_per_second_burst=50331648
ts=2023-06-26T17:36:21.634997Z lvl=info msg="Open store (start)" log_id=0ifCICUW000 service=storage-engine service=store op_name=tsdb_open op_event=start
ts=2023-06-26T17:36:21.645411Z lvl=info msg="index opened with 8 partitions" log_id=0ifCICUW000 service=storage-engine index=tsi
ts=2023-06-26T17:36:21.646533Z lvl=info msg="Reading file" log_id=0ifCICUW000 service=storage-engine engine=tsm1 service=cacheloader path=/opt/scrutiny/influxdb/engine/wal/419d63b44b7ea380/autogen/1/_00001.wal size=196
ts=2023-06-26T17:36:21.646678Z lvl=info msg="Opened shard" log_id=0ifCICUW000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/opt/scrutiny/influxdb/engine/data/419d63b44b7ea380/autogen/1 duration=9.181ms
ts=2023-06-26T17:36:21.646744Z lvl=info msg="Open store (end)" log_id=0ifCICUW000 service=storage-engine service=store op_name=tsdb_open op_event=end op_elapsed=11.749ms
ts=2023-06-26T17:36:21.646776Z lvl=info msg="Starting retention policy enforcement service" log_id=0ifCICUW000 service=retention check_interval=30m
ts=2023-06-26T17:36:21.646800Z lvl=info msg="Starting precreation service" log_id=0ifCICUW000 service=shard-precreation check_interval=10m advance_period=30m
ts=2023-06-26T17:36:21.647883Z lvl=info msg="Starting query controller" log_id=0ifCICUW000 service=storage-reads concurrency_quota=1024 initial_memory_bytes_quota_per_query=9223372036854775807 memory_bytes_quota_per_query=9223372036854775807 max_memory_bytes=0 queue_size=1024
ts=2023-06-26T17:36:21.657866Z lvl=info msg="Configuring InfluxQL statement executor (zeros indicate unlimited)." log_id=0ifCICUW000 max_select_point=0 max_select_series=0 max_select_buckets=0
ts=2023-06-26T17:36:21.668440Z lvl=info msg=Listening log_id=0ifCICUW000 service=tcp-listener transport=http addr=:8086 port=8086
starting scrutiny
scrutiny api not ready
2023/06/26 19:36:26 No configuration file found at /opt/scrutiny/config/scrutiny.yaml. Using Defaults.
 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
github.com/AnalogJ/scrutiny                             dev-0.7.1
Start the scrutiny server
time="2023-06-26T19:36:26+02:00" level=info msg="Trying to connect to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2023-06-26T19:36:26+02:00" level=info msg="Successfully connected to scrutiny sqlite db: /opt/scrutiny/config/scrutiny.db\n" type=web
time="2023-06-26T19:36:26+02:00" level=info msg="InfluxDB certificate verification: true\n" type=web
time="2023-06-26T19:36:26+02:00" level=info msg="Database migration starting. Please wait, this process may take a long time...." type=web
time="2023-06-26T19:36:26+02:00" level=info msg="Database migration completed successfully" type=web
time="2023-06-26T19:36:26+02:00" level=info msg="SQLite global configuration migrations starting. Please wait...." type=web
time="2023-06-26T19:36:26+02:00" level=info msg="SQLite global configuration migrations completed successfully" type=web
[19:36:26] INFO: Starting NGinx...
time="2023-06-26T19:36:29+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2023-06-26T19:36:29+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:29 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.74.0\" (1ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.74.0
time="2023-06-26T19:36:31+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:31 +0200] \"HEAD /api/health\" 200 0 \"\" \"curl/7.74.0\" (10ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=10 method=HEAD path=/api/health referer= respLength=0 statusCode=200 type=web userAgent=curl/7.74.0
starting scrutiny collector (run-once mode. subsequent calls will be triggered via cron service)
2023/06/26 19:36:31 No configuration file found at /opt/scrutiny/config/collector.yaml. Using Defaults.
 ___   ___  ____  __  __  ____  ____  _  _  _  _
/ __) / __)(  _ \(  )(  )(_  _)(_  _)( \( )( \/ )
\__ \( (__  )   / )(__)(   )(   _)(_  )  (  \  /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
AnalogJ/scrutiny/metrics                                dev-0.7.1
time="2023-06-26T19:36:31+02:00" level=info msg="Verifying required tools" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Executing command: smartctl --scan --json" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Executing command: smartctl --info --json /dev/sda" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Using WWN Fallback" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:31 +0200] \"POST /api/devices/register\" 200 543 \"\" \"Go-http-client/1.1\" (2ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=2 method=POST path=/api/devices/register referer= respLength=543 statusCode=200 type=web userAgent=Go-http-client/1.1
time="2023-06-26T19:36:31+02:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Executing command: smartctl --xall --json /dev/sda" type=metrics
time="2023-06-26T19:36:31+02:00" level=info msg="Publishing smartctl results for 0x57c35481f82a7a9c\n" type=metrics
time="2023-06-26T19:36:31+02:00" level=error msg="An error occurred while saving smartctl metrics unprocessable entity: failure writing points to database: partial write: points beyond retention policy dropped=1" type=web
time="2023-06-26T19:36:31+02:00" level=error msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:31 +0200] \"POST /api/device/0x57c35481f82a7a9c/smart\" 500 17 \"\" \"Go-http-client/1.1\" (11ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=11 method=POST path=/api/device/0x57c35481f82a7a9c/smart referer= respLength=17 statusCode=500 type=web userAgent=Go-http-client/1.1
time="2023-06-26T19:36:31+02:00" level=info msg="Main: Completed" type=metrics
time="2023-06-26T19:36:34+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2023-06-26T19:36:34+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:34 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.74.0\" (1ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.74.0
time="2023-06-26T19:36:39+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2023-06-26T19:36:39+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:39 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.74.0\" (1ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.74.0
time="2023-06-26T19:36:44+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2023-06-26T19:36:44+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:44 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.74.0\" (1ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.74.0
time="2023-06-26T19:36:49+02:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2023-06-26T19:36:49+02:00" level=info msg="127.0.0.1 - db21ed7f-scrutiny-fa [26/Jun/2023:19:36:49 +0200] \"GET /api/health\" 200 16 \"\" \"curl/7.74.0\" (1ms)" clientIP=127.0.0.1 hostname=db21ed7f-scrutiny-fa latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.74.0

Architecture

aarch64

OS

HAos

dm82m commented 1 year ago

I guess the error is here: An error occurred while saving smartctl metrics unprocessable entity

Found a reference here: https://github.com/AnalogJ/scrutiny/issues/305

alexbelgium commented 1 year ago

Hi, you're using scrutiny full access the SYS_RAWIO is not needed as all permissions should be already granted... Have you tried the normal scrutiny (not fa)? Also, are you sure your ssd supports smart? Most ssd don't actually

dm82m commented 1 year ago

Yes I tested without FA but then it shows no device. The ssd I am using is providing SMART.

dm82m commented 1 year ago

@alexbelgium any ideas? Should I raise the issue at Scrutiny directly cause it is more related there?

alexbelgium commented 1 year ago

Hi, perhaps it's better... Honestly I tried to look a bit but no idea. I was wondering if accessing the influxdb through the admin interface could help but no idea what to look for

dm82m commented 1 year ago

I tried the same, but no idea of user/password. Is it a standard or the HA user/pw? Found user+pw but doesnt help me.

dm82m commented 1 year ago

@alexbelgium how can I put Scrutiny into debug mode. I just want to assure what the smartctl commands are returning. It seems that I need to run Scrutiny in debug mode but have no idea how.

dm82m commented 1 year ago
root@db21ed7f-scrutiny-fa:/opt/scrutiny# lsblk -f
NAME   FSTYPE   FSVER LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda
|-sda1 vfat     FAT16 hassos-boot    91F7-85ED
|-sda2 squashfs 4.0
|-sda3 squashfs 4.0
|-sda4
|-sda5
|-sda6
|-sda7 ext4     1.0   hassos-overlay 4e60f2da-1b5f-41ad-a4f7-1f4e943fb154
`-sda8 ext4     1.0   hassos-data    bcaf2507-38ab-4426-9160-82e691be1ca3    193G     8% /etc/hosts
zram1
zram2
dm82m commented 1 year ago
root@db21ed7f-scrutiny-fa:/opt/scrutiny# smartctl -i /dev/sda -T permissive -d sat
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.1.34] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: Operation not permitted

=== START OF INFORMATION SECTION ===
Device Model:     [No Information Found]
Serial Number:    [No Information Found]
Firmware Version: [No Information Found]
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   [No Information Found]
Local Time is:    Tue Jun 27 17:40:30 2023 CEST
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
SMART support is: Unknown - Try option -s with argument 'on' to enable it.
dm82m commented 1 year ago

And SMART is enabled in BIOS. If I install proxmox on that device, smartctl is working. But on HAOS with your addon it isnt.

dm82m commented 1 year ago

I can finally say that the problem is definitely not the hardware. The add-on https://github.com/Draggon/hassio-hdd-tools is working on the same machine, I am getting all the SMART data with it. But weather Scrutiny nor Scrutiny-FA will give me any data for the hdd.

alexbelgium commented 1 year ago

That's super helpful, thanks for your investigations! I'll compare both addons

dm82m commented 1 year ago

I needed to disable protection mode but afterwards it just worked out. Initially testet directly within the container, compared to the tests I made with your Scrutiny add-ons - but as said with both I have no luck to get the SMART data.

dm82m commented 1 year ago

the hdd-tools is using both

  "privileged": ["SYS_ADMIN", "SYS_RAWIO"],
  "full_access": true,

you are only using this for scrutiny-fa "full_access": true,

and this for scrutiny

  "privileged": [
    "SYS_ADMIN",
    "SYS_RAWIO",
    "DAC_READ_SEARCH"
  ],

not sure if it is really related - but if you want to test, just release an update and I will do so ...

alexbelgium commented 1 year ago

well having both puts an alert message in the logs that made people remove the repo... About not being justified to have both full_access and privileges written. This can however be mimicked with Portainer if you are using the tool by switching on manually permissions

dm82m commented 1 year ago

but it works ...

image
alexbelgium commented 1 year ago

Really? By adding permissions in addition to full access? crazy... I'll check the supervisor logs to see if there is an error showing

dm82m commented 1 year ago

Yes really. Just put that:

  "full_access": true,
  "privileged": [
    "SYS_ADMIN",
    "SYS_RAWIO",
    "DAC_READ_SEARCH"
  ],  

and now it works ...

alexbelgium commented 1 year ago

Thanks! Then I'll push a new official version (not just test), It will take 5 min to build

alexbelgium commented 1 year ago

Thanks for this great troubleshooting

dm82m commented 1 year ago

works now, thanks for great co-work and fast change + release!