AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.02k stars 164 forks source link

[BUG] NVMe does not show capacity when using namespaces #466

Closed KyleSanderson closed 6 months ago

KyleSanderson commented 1 year ago

Describe the bug nvme0

Expected behavior nvme0n

Screenshots image

Log Files If related to missing devices or SMART data, please run the collector in DEBUG mode, and attach the log file. See /docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md for other troubleshooting tips.

docker run -it --rm -p 8080:8080 \
-v `pwd`/config:/opt/scrutiny/config \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e DEBUG=true \
-e COLLECTOR_LOG_FILE=/opt/scrutiny/config/collector.log \
-e SCRUTINY_LOG_FILE=/opt/scrutiny/config/web.log \
--name scrutiny \
ghcr.io/analogj/scrutiny:master-omnibus

# in another terminal trigger the collector
docker exec scrutiny scrutiny-collector-metrics run

The log files will be available on your host in the config directory. Please attach them to this issue.

Please also provide the output of docker info

AnalogJ commented 1 year ago

sorry, I'm not sure I understand what you're proposing here:

Describe the bug
nvme0

Expected behavior
nvme0n

I do see that the detected capacity is 0b, can you attach the debug logs so I can understand if the issue is with Scrutiny or Smartctl?

Tirarex commented 1 year ago

Has same problem with SAMSUNG MZQL21T9HCJR-00A07 (1.7tb model). No errors in log or after "collector_1 | starting cron". Other sata ssd and hdd works fine.

KyleSanderson commented 1 year ago

Has same problem with SAMSUNG MZQL21T9HCJR-00A07 (1.7tb model). No errors in log or after "collector_1 | starting cron". Other sata ssd and hdd works fine.

Yeah, it's pretty trivial to see. NVMe uses namespaces, which the software doesn't support.

Tirarex commented 1 year ago

Yeah, it's pretty trivial to see. NVMe uses namespaces, which the software doesn't support.

it works fine with my other ssd's - samsung 980 pro / adata legend s70 / adata legend 960 / something from IRDM but nvme.

Looks like only enterprise grade ssd's are problem

KyleSanderson commented 1 year ago

Yeah, it's pretty trivial to see. NVMe uses namespaces, which the software doesn't support.

it works fine with my other ssd's - samsung 980 pro / adata legend s70 / adata legend 960 / something from IRDM but nvme.

Looks like only enterprise grade ssd's are problem

It depends on if namespaces are exposed. Some bad controllers don't follow the spec, others expose 1 as 0, I believe the limit is 255 per device.

Hr46ph commented 8 months ago

Does this require any more info, like debug logs or smart output? I have a stack of INTEL SSDPE2KX040T8 that aren't reporting their size in Scrutiny. Smart output reports it twice 😋 .

uhthomas commented 7 months ago

I also see this on a KIOXIA CD6.

image

AnalogJ commented 7 months ago

@Hr46ph @uhthomas @KyleSanderson

If you can provide me with the raw smartctl json, I can take a look at why Scrutiny is unable to extract your capacity information:

smartctl --xall --json {DEVICE}

uhthomas commented 7 months ago

@Hr46ph @uhthomas @KyleSanderson

If you can provide me with the raw smartctl json, I can take a look at why Scrutiny is unable to extract your capacity information:

smartctl --xall --json {DEVICE}

❯ k exec -it scrutiny-collector-94lbl -- smartctl --xall --json /dev/nvme4
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      2
    ],
    "svn_revision": "5155",
    "platform_info": "x86_64-linux-6.1.69-talos",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--xall",
      "--json",
      "/dev/nvme4"
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/nvme4",
    "info_name": "/dev/nvme4",
    "type": "nvme",
    "protocol": "NVMe"
  },
  "model_name": "KCD61LUL3T84",
  "serial_number": "61Q0A05UT7B8",
  "firmware_version": "8002",
  "nvme_pci_vendor": {
    "id": 7695,
    "subsystem_id": 7695
  },
  "nvme_ieee_oui_identifier": 9233294,
  "nvme_total_capacity": 3840755982336,
  "nvme_unallocated_capacity": 0,
  "nvme_controller_id": 1,
  "nvme_version": {
    "string": "1.4",
    "value": 66560
  },
  "nvme_number_of_namespaces": 16,
  "local_time": {
    "time_t": 1706043228,
    "asctime": "Tue Jan 23 20:53:48 2024 UTC"
  },
  "smart_status": {
    "passed": true,
    "nvme": {
      "value": 0
    }
  },
  "nvme_smart_health_information_log": {
    "critical_warning": 0,
    "temperature": 57,
    "available_spare": 100,
    "available_spare_threshold": 44,
    "percentage_used": 0,
    "data_units_read": 10648491,
    "data_units_written": 11648374,
    "host_reads": 89057132,
    "host_writes": 277683503,
    "controller_busy_time": 175,
    "power_cycles": 33,
    "power_on_hours": 2040,
    "unsafe_shutdowns": 0,
    "media_errors": 0,
    "num_err_log_entries": 3546,
    "warning_temp_time": 0,
    "critical_comp_time": 0
  },
  "temperature": {
    "current": 57
  },
  "power_cycle_count": 33,
  "power_on_time": {
    "hours": 2040
  }
}
uhthomas commented 7 months ago

If you don't mind @AnalogJ, I'd like to take a try at fixing this!

AnalogJ commented 7 months ago

@uhthomas sure! Looking forward to the PR :)