Closed danb35 closed 2 years ago
@danb35 200% is expected for some WD disks; as for the the seagate drives (there may be oddness that requires a smartmon update), I do not have any to test with so I will need some info from you: smartctl -AHijl selftest --log="devstat"
and smartctl -AHil selftest --log="devstat"
and given where the errors are for your ssds as well (one set per model of drive) feel to replace serials with nonsense.
OK, here's the output of those two commands on one of the Seagate Exos disks in the first system, and on its SSD--attached, as the JSON output is pretty verbose. seagate-smartlog.txt ssd-smartlog.txt
@danb35 I have made some updates so let me know how they work for you and keep in mind that it goes through all attached storage boot pool included. As for the Seek Error Health on seagate's drives that seems to be another of their wacky numbers that does not make sense raw.
OK, the "integer expression expected" errors are now gone, but others remain. On the first system:
root@cbnas[~/FreeNAS-Report]# ./report.sh -c report.cfg
dc: divide by zero
...and on the second:
root@freenas2[...ank/ssd_backup/scripts/FreeNAS-Report]# ./report.sh -c report.cfg
jq: error (at <stdin>:575): Cannot iterate over null (null)
bc: stdin:1: syntax error: * unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: * unexpected
The reports themselves look pretty much the same.
@danb35 Can you execute the script with bash -x
to get the line numbers and the drive identifier that it's failing with?
Sure thing. Running on the second system with script report.log bash -x report.sh -c report.cfg
, it produced a 10 MB log file. All four errors occur within about 60 lines of each other, and all appear to deal with ada0
, a Sandisk SSD. Here's the smartctl
output you'd requested above on that drive.
The first system gives the divide by zero
error with respect to its ssd, which I gave the smartctl
output for previously (ssd-smartlog.txt
above). The surrounding lines in the log file are:
++ bc
++ sed -e 's:^\.:0.:'
+ local totalBW=0.0
++ bc -l
+ (( 0 ))
++ bc -l
+ (( 0 ))
+ local totalBWColor=#ffffff
+ '[' 0.0 = 0.0 ']'
+ totalBW=N/A
++ bc
++ sed -e 's:^\.:0.:'
dc: divide by zero
+ local bwPerDay=0.0
+ '[' 0.0 = 0.0 ']'
+ bwPerDay=N/A
+ '[' 0 -gt 5 ']'
@danb35 for the sandisk ssd can you post the output of smartctl -AHijl selftest --log="devstat"
and smartctl -AHil selftest --log="devstat"
as two separate files? in your last the plain text version truncated.
@danb35 ping?
Sorry, things have been pretty busy with the holidays. I'll get back to it in a day or two.
@danb35 also please test agian with the latest commit.
With the latest commit, the script runs on both systems without reporting any errors on the CLI. The generated reports don't appear to have changed noticeably from a week ago--they're clean, without dangling HTML tags and such. The issues noted above with "Seek Error Health" haven't changed, but understand those relate to weird reporting from the drives themselves.
Fixed in 8bd7c14a28ee3cc6fe9b84a3849ed27260e3fc05.
I'm testing out the
topic/refactor
branch of this script on two different TrueNAS CORE 12 systems. It's generating the reports and emailing them, and the data looks correct, but I'm getting some errors when running the script on both of them. I'm not backing up the config file with either of these.The first system runs CORE 12.0-U6.1 with a single data pool consisting of two mirrored 16 TB disks, and no UPS. Output of the script is:
The report is prepared, and looks fine, except that the "seek error health" is reading 83-84% on three-month-old disks (Seagate Exos).
The second system runs CORE 12.0-U7 and has 28 disks installed in two pools--tank is 24 disks in four, six-disk RAIDZ2 vdevs, and pve-images is four disks in striped mirrors. I've also enabled UPS statistics for this one. Once again, it emails the report, and the report looks fine--the table layout works, there's no stray HTML codes, etc. And as before, there's some wonkiness with the Seek Error Health column. Seagate disks again are showing < 100% (the range is 86% to 95%), and some (but not all) of the WD disks are showing 200%. And again, the script outputs some errors when run: