AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.05k stars 164 forks source link

[BUG] ERROR: json: cannot unmarshal number into Go value of type models.DeviceWrapper #421

Open snyssen opened 1 year ago

snyssen commented 1 year ago

Describe the bug Collector fails to parse the JSON data from smartctl and instead returns ERROR: json: cannot unmarshal number into Go value of type models.DeviceWrapper. Since data cannot be parsed, it is never sent to web instance.

Expected behavior JSON should be correctly parsed and sent to the web instance.

Log Files

time="2023-01-06T08:56:53Z" level=debug msg="{\n\t\"api\": {\n\t\t\"endpoint\": \"https://CENSORED/\"\n\t},\n\t\"commands\": {\n\t\t\"metrics_info_args\": \"--info --json\",\n\t\t\"metrics_scan_args\": \"--scan --json\",\n\t\t\"metrics_smart_args\": \"--xall --json\",\n\t\t\"metrics_smartctl_bin\": \"smartctl\"\n\t},\n\t\"devices\": [],\n\t\"host\": {\n\t\t\"id\": \"\"\n\t},\n\t\"log\": {\n\t\t\"file\": \"/tmp/collector.log\",\n\t\t\"level\": \"DEBUG\"\n\t}\n}" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Verifying required tools" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --scan --json" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--scan", "--json" ], "exit_status": 0 }, "devices": [ { "name": "/dev/sda", "info_name": "/dev/sda", "type": "scsi", "protocol": "SCSI" }, { "name": "/dev/sdb", "info_name": "/dev/sdb", "type": "scsi", "protocol": "SCSI" }, { "name": "/dev/sdc", "info_name": "/dev/sdc", "type": "scsi", "protocol": "SCSI" }, { "name": "/dev/sdd", "info_name": "/dev/sdd", "type": "scsi", "protocol": "SCSI" }, { "name": "/dev/sdf", "info_name": "/dev/sdf", "type": "scsi", "protocol": "SCSI" } ] } time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --info --json /dev/sdf" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--info", "--json", "/dev/sdf" ], "exit_status": 0 }, "device": { "name": "/dev/sdf", "info_name": "/dev/sdf [SAT]", "type": "sat", "protocol": "ATA" }, "model_family": "Seagate IronWolf", "model_name": "ST2000VN004-2E4164", "serial_number": "Z524CEHK", "wwn": { "naa": 5, "oui": 3152, "id": 2952854872 }, "firmware_version": "SC60", "user_capacity": { "blocks": 3907029168, "bytes": 2000398934016 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5900, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "trim": { "supported": false }, "in_smartctl_database": true, "ata_version": { "string": "ACS-2, ACS-3 T13/2161-D revision 3b", "major_value": 1008, "minor_value": 31 }, "sata_version": { "string": "SATA 3.1", "value": 126 }, "interface_speed": { "max": { "sata_value": 14, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 }, "current": { "sata_value": 3, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1672995413, "asctime": "Fri Jan 6 08:56:53 2023 UTC" } } time="2023-01-06T08:56:53Z" level=info msg="Generating WWN" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="NAA: 5 OUI: 3152 Id: 2952854872 => WWN: 0x5000c500b000fd58" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --info --json /dev/sda" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--info", "--json", "/dev/sda" ], "exit_status": 0 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda [SAT]", "type": "sat", "protocol": "ATA" }, "model_family": "Seagate IronWolf", "model_name": "ST4000VN008-2DR166", "serial_number": "ZDH9AT9F", "wwn": { "naa": 5, "oui": 3152, "id": 3355733740 }, "firmware_version": "SC60", "user_capacity": { "blocks": 7814037168, "bytes": 4000787030016 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5980, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "trim": { "supported": false }, "in_smartctl_database": true, "ata_version": { "string": "ACS-3 T13/2161-D revision 5", "major_value": 2032, "minor_value": 109 }, "sata_version": { "string": "SATA 3.1", "value": 127 }, "interface_speed": { "max": { "sata_value": 14, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 }, "current": { "sata_value": 3, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1672995413, "asctime": "Fri Jan 6 08:56:53 2023 UTC" } } time="2023-01-06T08:56:53Z" level=info msg="Generating WWN" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="NAA: 5 OUI: 3152 Id: 3355733740 => WWN: 0x5000c500c8046eec" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --info --json /dev/sdb" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--info", "--json", "/dev/sdb" ], "exit_status": 0 }, "device": { "name": "/dev/sdb", "info_name": "/dev/sdb [SAT]", "type": "sat", "protocol": "ATA" }, "model_family": "Seagate Exos X14", "model_name": "ST12000NM0538-2K2101", "serial_number": "ZHZ12PTZ", "wwn": { "naa": 5, "oui": 3152, "id": 3016834062 }, "firmware_version": "CMA2", "user_capacity": { "blocks": 23437770752, "bytes": 12000138625024 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "trim": { "supported": false }, "in_smartctl_database": true, "ata_version": { "string": "ACS-4 T13/BSR INCITS 529 revision 5", "major_value": 4064, "minor_value": 94 }, "sata_version": { "string": "SATA 3.3", "value": 511 }, "interface_speed": { "max": { "sata_value": 14, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 }, "current": { "sata_value": 3, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1672995413, "asctime": "Fri Jan 6 08:56:53 2023 UTC" } } time="2023-01-06T08:56:53Z" level=info msg="Generating WWN" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="NAA: 5 OUI: 3152 Id: 3016834062 => WWN: 0x5000c500b3d13c0e" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --info --json /dev/sdc" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--info", "--json", "/dev/sdc" ], "exit_status": 0 }, "device": { "name": "/dev/sdc", "info_name": "/dev/sdc [SAT]", "type": "sat", "protocol": "ATA" }, "model_family": "Seagate BarraCuda 3.5", "model_name": "ST2000DM008-2FR102", "serial_number": "ZFL123SX", "wwn": { "naa": 5, "oui": 3152, "id": 3272638760 }, "firmware_version": "0001", "user_capacity": { "blocks": 3907029168, "bytes": 2000398934016 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "trim": { "supported": true, "deterministic": false, "zeroed": false }, "in_smartctl_database": true, "ata_version": { "string": "ACS-3 T13/2161-D revision 5", "major_value": 2032, "minor_value": 109 }, "sata_version": { "string": "SATA 3.1", "value": 127 }, "interface_speed": { "max": { "sata_value": 14, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 }, "current": { "sata_value": 3, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1672995413, "asctime": "Fri Jan 6 08:56:53 2023 UTC" } } time="2023-01-06T08:56:53Z" level=info msg="Generating WWN" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="NAA: 5 OUI: 3152 Id: 3272638760 => WWN: 0x5000c500c3108128" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Executing command: smartctl --info --json /dev/sdd" type=metrics { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 2 ], "svn_revision": "5155", "platform_info": "x86_64-linux-6.0.15-200.fc36.x86_64", "build_info": "(local build)", "argv": [ "smartctl", "--info", "--json", "/dev/sdd" ], "exit_status": 0 }, "device": { "name": "/dev/sdd", "info_name": "/dev/sdd [SAT]", "type": "sat", "protocol": "ATA" }, "model_family": "Seagate IronWolf", "model_name": "ST2000VN004-2E4164", "serial_number": "Z524CF7S", "wwn": { "naa": 5, "oui": 3152, "id": 2952839804 }, "firmware_version": "SC60", "user_capacity": { "blocks": 3907029168, "bytes": 2000398934016 }, "logical_block_size": 512, "physical_block_size": 4096, "rotation_rate": 5900, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "trim": { "supported": false }, "in_smartctl_database": true, "ata_version": { "string": "ACS-2, ACS-3 T13/2161-D revision 3b", "major_value": 1008, "minor_value": 31 }, "sata_version": { "string": "SATA 3.1", "value": 126 }, "interface_speed": { "max": { "sata_value": 14, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 }, "current": { "sata_value": 3, "string": "6.0 Gb/s", "units_per_second": 60, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1672995413, "asctime": "Fri Jan 6 08:56:53 2023 UTC" } } time="2023-01-06T08:56:53Z" level=info msg="Generating WWN" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="NAA: 5 OUI: 3152 Id: 2952839804 => WWN: 0x5000c500b000c27c" type=metrics time="2023-01-06T08:56:53Z" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics time="2023-01-06T08:56:53Z" level=debug msg="Detected devices: [{\"wwn\":\"0x5000c500b000fd58\",\"device_name\":\"sdf\",\"device_uuid\":\"\",\"device_serial_id\":\"\",\"device_label\":\"\",\"manufacturer\":\"\",\"model_name\":\"ST2000VN004-2E4164\",\"interface_type\":\"\",\"interface_speed\":\"6.0 Gb/s\",\"serial_number\":\"Z524CEHK\",\"firmware\":\"SC60\",\"rotational_speed\":5900,\"capacity\":2000398934016,\"form_factor\":\"3.5 inches\",\"smart_support\":false,\"device_protocol\":\"ATA\",\"device_type\":\"scsi\",\"label\":\"\",\"host_id\":\"\"},{\"wwn\":\"0x5000c500c8046eec\",\"device_name\":\"sda\",\"device_uuid\":\"\",\"device_serial_id\":\"\",\"device_label\":\"\",\"manufacturer\":\"\",\"model_name\":\"ST4000VN008-2DR166\",\"interface_type\":\"\",\"interface_speed\":\"6.0 Gb/s\",\"serial_number\":\"ZDH9AT9F\",\"firmware\":\"SC60\",\"rotational_speed\":5980,\"capacity\":4000787030016,\"form_factor\":\"3.5 inches\",\"smart_support\":false,\"device_protocol\":\"ATA\",\"device_type\":\"scsi\",\"label\":\"\",\"host_id\":\"\"},{\"wwn\":\"0x5000c500b3d13c0e\",\"device_name\":\"sdb\",\"device_uuid\":\"\",\"device_serial_id\":\"\",\"device_label\":\"\",\"manufacturer\":\"\",\"model_name\":\"ST12000NM0538-2K2101\",\"interface_type\":\"\",\"interface_speed\":\"6.0 Gb/s\",\"serial_number\":\"ZHZ12PTZ\",\"firmware\":\"CMA2\",\"rotational_speed\":7200,\"capacity\":12000138625024,\"form_factor\":\"3.5 inches\",\"smart_support\":false,\"device_protocol\":\"ATA\",\"device_type\":\"scsi\",\"label\":\"\",\"host_id\":\"\"},{\"wwn\":\"0x5000c500c3108128\",\"device_name\":\"sdc\",\"device_uuid\":\"\",\"device_serial_id\":\"\",\"device_label\":\"\",\"manufacturer\":\"\",\"model_name\":\"ST2000DM008-2FR102\",\"interface_type\":\"\",\"interface_speed\":\"6.0 Gb/s\",\"serial_number\":\"ZFL123SX\",\"firmware\":\"0001\",\"rotational_speed\":7200,\"capacity\":2000398934016,\"form_factor\":\"3.5 inches\",\"smart_support\":false,\"device_protocol\":\"ATA\",\"device_type\":\"scsi\",\"label\":\"\",\"host_id\":\"\"},{\"wwn\":\"0x5000c500b000c27c\",\"device_name\":\"sdd\",\"device_uuid\":\"\",\"device_serial_id\":\"\",\"device_label\":\"\",\"manufacturer\":\"\",\"model_name\":\"ST2000VN004-2E4164\",\"interface_type\":\"\",\"interface_speed\":\"6.0 Gb/s\",\"serial_number\":\"Z524CF7S\",\"firmware\":\"SC60\",\"rotational_speed\":5900,\"capacity\":2000398934016,\"form_factor\":\"3.5 inches\",\"smart_support\":false,\"device_protocol\":\"ATA\",\"device_type\":\"scsi\",\"label\":\"\",\"host_id\":\"\"}]" type=metrics 2023/01/06 08:56:53 ERROR: json: cannot unmarshal number into Go value of type models.DeviceWrapper


- Definition of collector in `docker-compose.yml`:
```yaml
version: "3"
services:
  scrutiny:
    image: ghcr.io/analogj/scrutiny:v0.5.0-collector
    container_name: scrutiny
    cap_add:
      - SYS_RAWIO
    volumes:
      - /run/udev:/run/udev:ro
    environment:
      COLLECTOR_API_ENDPOINT: https://CENSORED
    devices:
      - "/dev/sdb"
      - "/dev/sda"
      - "/dev/sdc"
      - "/dev/sdd"
      - "/dev/sdf"
snyssen commented 1 year ago

The issue still happens with v0.6.0. Is there any update on this?

Hyurt commented 1 year ago

@snyssen Ran into the same issue since yesteday. I think this is because smartctl has changed its output json (rotation_rate has changed as it was rotational_speed)

However the device model is looking for rotational_speed https://github.com/AnalogJ/scrutiny/blob/ee893cc360276aaf9684edb24c5fc0b61bd3a2e5/webapp/backend/pkg/models/device.go#L34

Hyurt commented 10 months ago

@snyssen I figured out how to make it work Thanks to this and some of the documentation https://github.com/AnalogJ/scrutiny/issues/255 https://github.com/AnalogJ/scrutiny/blob/5bbd4c3b64f24139350e6d9c27548151a046d116/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md?plain=1#L253

I have Ironwolf too and editing my collector.yaml to this


devices:
  - device: /dev/sda
    type: 'sat'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '--vendorattribute=188,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.

  - device: /dev/sdb
    type: 'sat'
    commands:
      metrics_info_args: '--info --json -T permissive' # used to determine device unique ID & register device with Scrutiny
      metrics_smart_args: '--vendorattribute=188,raw48:54 --xall --json -T permissive' # used to retrieve smart data for each device.

made the trick. I have some error that i need to dig, but

INFO[0001] Collecting smartctl results for sda           type=metrics
INFO[0001] Executing command: smartctl --vendorattribute=188,raw48:54 --xall --json -T permissive --device sat /dev/sda  type=metrics
ERRO[0002] smartctl returned an error code (4) while processing sda  type=metrics
ERRO[0002] smartctl detected a checksum error            type=metrics
INFO[0002] Publishing smartctl results for o6au0ny34iqbd0xdskdm  type=metrics
INFO[0005] Collecting smartctl results for sdb           type=metrics
INFO[0005] Executing command: smartctl --vendorattribute=188,raw48:54 --xall --json -T permissive --device sat /dev/sdb  type=metrics
ERRO[0005] smartctl returned an error code (64) while processing sdb  type=metrics
ERRO[0005] smartctl detected a error log with errors     type=metrics
INFO[0005] Publishing smartctl results for 0x5000c500c738f678  type=metrics
INFO[0012] Main: Completed

I have information put in my backend

snyssen commented 10 months ago

@Hyurt It does not seem to work on my ned

time="2023-11-26T21:28:42Z" level=info msg="Verifying required tools" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --scan --json" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --info --json -T permissive --device sat /dev/sde" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Generating WWN" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --info --json -T permissive --device sat /dev/sda" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Generating WWN" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --info --json -T permissive --device sat /dev/sdb" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Generating WWN" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --info --json -T permissive --device sat /dev/sdc" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Generating WWN" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Executing command: smartctl --info --json -T permissive --device sat /dev/sdd" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Generating WWN" type=metrics
time="2023-11-26T21:28:42Z" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
2023/11/26 21:28:43 ERROR: json: cannot unmarshal number into Go value of type models.DeviceWrapper

But that's worth taking a better look at, thanks!

snyssen commented 9 months ago

Further debugging showed that this issue did not appear on the omnibus image. It seems there are significant differences between the build steps of the binary in the omnibus and the collector Dockerfiles. I wonder if the collector Dockerfile was kept up to date at all... Will have to check when I get the time.

Hyurt commented 9 months ago

To be honest i gave up docker images for the collector on my end - my RPI had some trouble with and tried with binaries. If you can give a shot with it to find the right parameters to make it work at least once

snyssen commented 9 months ago

I found the issue!

I had to debug the program line by line on my computer but I found it! And it had nothing to do with the way it is compiled and packaged... The json unmarshal error does not even come from from the json of the devices, but actually from a bug with the HTTP client itself. I had forgotten that I had set an authentication middleware on the reverse proxy I use to access the scrutiny web instance, and that the collector requests were routed through said proxy, meaning they could never reach the API unless they were authenticated. All of that is on me and I should have spotted it, but the collector was really not helpful in debugging this since it does not detect HTTP error codes as actual error, and was thus happily going along with the failed request and trying to unmarshal its body... Which isn't json, so it could obviously never work. I have never really coded anything in Go so I don't know if I'll be any help, but I'll see if I can go ahead and create a PR to fix this issue.