Closed louwrentius closed 9 years ago
Hmm, the error is in the pySMART library, this makes it hard to fix. Perhaps i can contact the original author, but i doubt he will fix it.
No, I think you need to put a Try / Except IndexError around your code to catch this error. I notice that Debian Wheezy is also using an older smartctl version.
I will put the try catch in, but that wont fix the underlying issue, it will only stop the code from crashing, but still not yield results.
Hey guys, I'm the author of pySMART. I just happened to stumble into this thread while Googling, and I'd love to fix the issue you found. I can see where this would crash if smartctl prints "Serial Number:" and then only whitespace (or nothing) following. I've never seen that behavior before; I've seen smartctl skip printing some of the lines when a value doesn't exist, so it seemed logical to assume that if it printed a line there'd be a non-whitespace/non-null value to parse on that line...
Louwrentius, do you happen to have the output of smartctl for the device that was causing the crash? Do you know what version of smartctl you were using, on which OS? It looked like maybe Debian Wheezy from your post above which looks like it might have 5.41 by default? The minimum version I've tested with on Linux is 5.42, but I doubt that's related to this issue. It seems like maybe your device is actually reporting an all-whitespace serial number to smartctl (?), and my code crashes because I never expected to have to parse that. :) I just want to be sure this is what's really going on, and most likely I'll make the other line parsings more robust to prevent these kind of problems in the future. Thanks!
Hi,
No problem: here is the requested output. This box is Ubuntu.
root@server:/usr/src/zfsmond# dpkg -l | grep -i smart
ii libatasmart4 0.18-3 ATA S.M.A.R.T. reading and parsing library
_ii _smartmontools 5.41+svn3365-1 control and monitor storage systems using S.M.A.R.T.
root@server:/usr/src/zfsmond# cat /etc/u
ucf.conf udev/ ufw/ updatedb.conf update-manager/ update-motd.d/
root@server:/usr/src/zfsmond# cat /etc/debian_version
wheezy/sid
root@server:/usr/src/zfsmond#
root@server:/usr/src/zfsmond# show disk -smp
-----------------------------------------------------------------------
| Dev | Model | GB | /dev/disk/by-path |
-----------------------------------------------------------------------
| sda | ST250LM004 HN-M250MBB | 250 | pci-0000:00:1f.2-scsi-0:0:0:0 |
| sdb | SAMSUNG HM250JI | 250 | pci-0000:00:1f.2-scsi-1:0:0:0 |
| sdc | OCZ-VERTEX2 | 60 | pci-0000:00:1f.2-scsi-2:0:0:0 |
| sdd | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:0:0 |
| sde | ST2000DM001-1CH164 | 2000 | pci-0000:03:04.0-scsi-0:0:1:0 |
| sdf | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:2:0 |
| sdg | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:3:0 |
| sdh | ST2000DM001-1ER164 | 2000 | pci-0000:03:04.0-scsi-0:0:4:0 |
| sdi | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:5:0 |
| zd0 | | 536 | |
| zd16 | | 536 | |
| zd32 | | 536 | |
-----------------------------------------------------------------------
root@server:/usr/src/zfsmond# smart -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
pci-0000:03:04.0-scsi-0:0:0:0 pci-0000:03:04.0-scsi-0:0:0:0-part1 pci-0000:03:04.0-scsi-0:0:0:0-part9
root@server:/usr/src/zfsmond# smart -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
The program 'smart' is currently not installed. You can install it by typing:
apt-get install smartpm-core
root@server:/usr/src/zfsmond# smartctl -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
_smartctl 5.41 2011-06-09 r3365 x86_64-linux-3.2.0-68-generic_
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: ST2000DM001-9YN164
Serial Number: Z1E0RR08
LU WWN Device Id: 5 000c50 04d49e7ef
Firmware Version: CC4C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sat Jun 13 14:08:21 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
Self-test execution status: ( 0) The previous self-test routine completed
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
SMART capabilities: (0x0003) Saves SMART data before entering
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 224) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
_ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAWVALUE
_183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always
_184 End-to-End_Error 0x0032 100 100 099 Old_age Always
_187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
_188 Command_Timeout 0x0032 100 100 000 Old_age Always
_189 High_Fly_Writes 0x003a 099 099 000 Old_age Always
_190 Airflow_Temperature_Cel 0x0022 069 057 045 Old_age Always
_191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always
_192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
_193 Load_Cycle_Count 0x0032 056 056 000 Old_age Always
_194 Temperature_Celsius 0x0022 031 043 000 Old_age Always
_197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
_198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
_199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
_240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline
_241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline
_242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
Selective self-test flags (0x0):
If Selective self-test is pending on power-up, resume after 0 minute delay.
2015-06-13 8:32 GMT+02:00 mth309 notifications@github.com:
Hey guys, I'm the author of pySMART. I just happened to stumble into this thread while Googling, and I'd love to fix the issue you found. I can see where this would crash if smartctl prints "Serial Number:" and then only whitespace (or nothing) following. I've never seen that behavior before; I've seen smartctl skip printing some of the lines when a value doesn't exist, so it seemed logical to assume that if it printed a line there'd be a non-whitespace/non-null value to parse on that line...
Louwrentius, do you happen to have the output of smartctl for the device that was causing the crash? Do you know what version of smartctl you were using, on which OS? It looked like maybe Debian Wheezy from your post above which looks like it might have 5.41 by default? The minimum version I've tested with on Linux is 5.42, but I doubt that's related to this issue. It seems like maybe your device is actually reporting an all-whitespace serial number to smartctl (?), and my code crashes because I never expected to have to parse that. :) I just want to be sure this is what's really going on, and most likely I'll make the other line parsings more robust to prevent these kind of problems in the future. Thanks!
— Reply to this email directly or view it on GitHub https://github.com/FireDrunk/ZFSmond/issues/5#issuecomment-111680850.
Thank you for providing all of that information. The only thing that still has me confused is that your example doesn’t appear to show a scenario which the existing code can’t handle. As shown below, pySMART should parse the 'Z1E0RR08' serial number out of that file just fine. I searched the file for the word “Serial” to see if it came up on a second line, and maybe that was the line causing the crash, but it’s only in there once.
Here are the basic tests that I expect pySMART to pass that seem to include the file you provided:
line = "Serial Number: Z1E0RR08" # Test w/ space delimiter
if 'Serial Number' in line or 'Serial number' in line:
... serial = line.split(':')[1].split()[0].rstrip()
...
serial
'Z1E0RR08' # Correct
line = "Serial Number:\tZ1E0RR08" # Test w/ tab delimiter
print(line)
Serial Number: Z1E0RR08
if 'Serial Number' in line or 'Serial number' in line:
... serial = line.split(':')[1].split()[0].rstrip()
...
serial
'Z1E0RR08' # Correct
line = "Serial Number: \t\t \t Z1E0RR08" # Test crazy mixture of spaces & tabs
if 'Serial Number' in line or 'Serial number' in line:
... serial = line.split(':')[1].split()[0].rstrip()
...
serial
'Z1E0RR08' # Correct
Now here is a test that based on your problem report I expected to see and fail. Specifically, a device reporting an all-whitespace (or null) value to the right of the colon:
line = "Serial Number: " # Test w/ all whitespace
if 'Serial Number' in line or 'Serial number' in line:
... serial = line.split(':')[1].split()[0].rstrip()
...
Traceback (most recent call last):
File "
IndexError: list index out of range # The crash you reported
In order to fix this, I could easily wrap all parsing statements in a try/except:
serial = None # serial is initialized to None in Device.init()
line = "Serial Number: " # Test whitespace again
if 'Serial Number' in line or 'Serial number' in line:
... try: # easy fix?
... serial = line.split(':')[1].split()[0].rstrip()
... except IndexError:
... pass # No need to do anything, just don’t crash
...
print serial
None # Prints fine when used later
This is more robust regardless, so I’ll probably go through and do it anyway, but my concern is that for the example file you provided this doesn’t seem necessary? I want to be sure that I’m correcting the issue you experienced, as opposed to just what I assumed the issue might be. For example, if there’s a parseable serial number being printed, but somehow crashing my parsing statement, I’d rather fix the parsing statement to correctly extract it than just fall back on “None”. Ideally, the combination of both fixes would be best, but I’d need to see a “valid” serial number line that confuses my parser.
Thank you, Marc
From: louwrentius [mailto:notifications@github.com] Sent: Saturday, June 13, 2015 5:14 AM To: FireDrunk/ZFSmond Cc: mth309 Subject: Re: [ZFSmond] Exceptions returned from smartctl not caught/handled (crash) (#5)
Hi,
No problem: here is the requested output. This box is Ubuntu.
root@server:/usr/src/zfsmond# dpkg -l | grep -i smart
ii libatasmart4 0.18-3 ATA S.M.A.R.T. reading and parsing library
_ii _smartmontools 5.41+svn3365-1 control and monitor storage systems using S.M.A.R.T.
root@server:/usr/src/zfsmond# cat /etc/u
ucf.conf udev/ ufw/ updatedb.conf update-manager/ update-motd.d/
root@server:/usr/src/zfsmond# cat /etc/debian_version
wheezy/sid
root@server:/usr/src/zfsmond#
root@server:/usr/src/zfsmond# show disk -smp
-----------------------------------------------------------------------
| Dev | Model | GB | /dev/disk/by-path |
-----------------------------------------------------------------------
| sda | ST250LM004 HN-M250MBB | 250 | pci-0000:00:1f.2-scsi-0:0:0:0 |
| sdb | SAMSUNG HM250JI | 250 | pci-0000:00:1f.2-scsi-1:0:0:0 |
| sdc | OCZ-VERTEX2 | 60 | pci-0000:00:1f.2-scsi-2:0:0:0 |
| sdd | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:0:0 |
| sde | ST2000DM001-1CH164 | 2000 | pci-0000:03:04.0-scsi-0:0:1:0 |
| sdf | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:2:0 |
| sdg | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:3:0 |
| sdh | ST2000DM001-1ER164 | 2000 | pci-0000:03:04.0-scsi-0:0:4:0 |
| sdi | ST2000DM001-9YN164 | 2000 | pci-0000:03:04.0-scsi-0:0:5:0 |
| zd0 | | 536 | |
| zd16 | | 536 | |
| zd32 | | 536 | |
-----------------------------------------------------------------------
root@server:/usr/src/zfsmond# smart -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
pci-0000:03:04.0-scsi-0:0:0:0 pci-0000:03:04.0-scsi-0:0:0:0-part1 pci-0000:03:04.0-scsi-0:0:0:0-part9
root@server:/usr/src/zfsmond# smart -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
The program 'smart' is currently not installed. You can install it by typing:
apt-get install smartpm-core
root@server:/usr/src/zfsmond# smartctl -a -d ata /dev/disk/by-path/pci-0000:03:04.0-scsi-0:0:0:0
_smartctl 5.41 2011-06-09 r3365 x86_64-linux-3.2.0-68-generic_
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: ST2000DM001-9YN164
Serial Number: Z1E0RR08
LU WWN Device Id: 5 000c50 04d49e7ef
Firmware Version: CC4C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sat Jun 13 14:08:21 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
Self-test execution status: ( 0) The previous self-test routine completed
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
SMART capabilities: (0x0003) Saves SMART data before entering
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 224) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
_ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAWVALUE
*183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always
*184 End-to-End_Error 0x0032 100 100 099 Old_age Always
*187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
*188 Command_Timeout 0x0032 100 100 000 Old_age Always
*189 High_Fly_Writes 0x003a 099 099 000 Old_age Always
*190 Airflow_Temperature_Cel 0x0022 069 057 045 Old_age Always
*191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always
*192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
*193 Load_Cycle_Count 0x0032 056 056 000 Old_age Always
*194 Temperature_Celsius 0x0022 031 043 000 Old_age Always
*197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
*198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
*199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
*240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline
*241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline
*242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
Selective self-test flags (0x0):
If Selective self-test is pending on power-up, resume after 0 minute delay.
2015-06-13 8:32 GMT+02:00 mth309 <notifications@github.com mailto:notifications@github.com >:
Hey guys, I'm the author of pySMART. I just happened to stumble into this thread while Googling, and I'd love to fix the issue you found. I can see where this would crash if smartctl prints "Serial Number:" and then only whitespace (or nothing) following. I've never seen that behavior before; I've seen smartctl skip printing some of the lines when a value doesn't exist, so it seemed logical to assume that if it printed a line there'd be a non-whitespace/non-null value to parse on that line...
Louwrentius, do you happen to have the output of smartctl for the device that was causing the crash? Do you know what version of smartctl you were using, on which OS? It looked like maybe Debian Wheezy from your post above which looks like it might have 5.41 by default? The minimum version I've tested with on Linux is 5.42, but I doubt that's related to this issue. It seems like maybe your device is actually reporting an all-whitespace serial number to smartctl (?), and my code crashes because I never expected to have to parse that. :) I just want to be sure this is what's really going on, and most likely I'll make the other line parsings more robust to prevent these kind of problems in the future. Thanks!
— Reply to this email directly or view it on GitHub https://github.com/FireDrunk/ZFSmond/issues/5#issuecomment-111680850.
— Reply to this email directly or view it on GitHub https://github.com/FireDrunk/ZFSmond/issues/5#issuecomment-111704900 . https://github.com/notifications/beacon/AMRf3dl-eG8yqIKpTc99p1AMcKg6mHuNks5oTBXvgaJpZM4Eb6lj.gif
File "/usr/local/lib/python2.7/dist-packages/pySMART/device.py", line 479, in update self.serial = line.split(':')[1].split()[0].rstrip() IndexError: list index out of range