dak180 / FreeNAS-Report

SMART & ZPool Status Report for FreeNAS/TrueNAS
GNU General Public License v3.0
38 stars 8 forks source link

"integer expression expected" #4

Closed danb35 closed 2 years ago

danb35 commented 2 years ago

I'm testing out the topic/refactor branch of this script on two different TrueNAS CORE 12 systems. It's generating the reports and emailing them, and the data looks correct, but I'm getting some errors when running the script on both of them. I'm not backing up the config file with either of these.

The first system runs CORE 12.0-U6.1 with a single data pool consisting of two mirrored 16 TB disks, and no UPS. Output of the script is:

root@cbnas[/mnt/tank/scripts]# ./report.sh -c report.cfg
./report.sh: line 846: [: : integer expression expected
./report.sh: line 848: [: : integer expression expected
dc: divide by zero
root@cbnas[/mnt/tank/scripts]#

The report is prepared, and looks fine, except that the "seek error health" is reading 83-84% on three-month-old disks (Seagate Exos).

The second system runs CORE 12.0-U7 and has 28 disks installed in two pools--tank is 24 disks in four, six-disk RAIDZ2 vdevs, and pve-images is four disks in striped mirrors. I've also enabled UPS statistics for this one. Once again, it emails the report, and the report looks fine--the table layout works, there's no stray HTML codes, etc. And as before, there's some wonkiness with the Seek Error Health column. Seagate disks again are showing < 100% (the range is 86% to 95%), and some (but not all) of the WD disks are showing 200%. And again, the script outputs some errors when run:

root@freenas2[...ank/ssd_backup/scripts/FreeNAS-Report]# ./report.sh -c report.cfg
jq: error (at <stdin>:575): Cannot iterate over null (null)
./report.sh: line 789: [: : integer expression expected
./report.sh: line 791: [: : integer expression expected
bc: stdin:1: syntax error: * unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: * unexpected
root@freenas2[...ank/ssd_backup/scripts/FreeNAS-Report]#
dak180 commented 2 years ago

@danb35 200% is expected for some WD disks; as for the the seagate drives (there may be oddness that requires a smartmon update), I do not have any to test with so I will need some info from you: smartctl -AHijl selftest --log="devstat" and smartctl -AHil selftest --log="devstat" and given where the errors are for your ssds as well (one set per model of drive) feel to replace serials with nonsense.

danb35 commented 2 years ago

OK, here's the output of those two commands on one of the Seagate Exos disks in the first system, and on its SSD--attached, as the JSON output is pretty verbose. seagate-smartlog.txt ssd-smartlog.txt

dak180 commented 2 years ago

@danb35 I have made some updates so let me know how they work for you and keep in mind that it goes through all attached storage boot pool included. As for the Seek Error Health on seagate's drives that seems to be another of their wacky numbers that does not make sense raw.

danb35 commented 2 years ago

OK, the "integer expression expected" errors are now gone, but others remain. On the first system:

root@cbnas[~/FreeNAS-Report]# ./report.sh -c report.cfg
dc: divide by zero

...and on the second:

root@freenas2[...ank/ssd_backup/scripts/FreeNAS-Report]# ./report.sh -c report.cfg 
jq: error (at <stdin>:575): Cannot iterate over null (null)
bc: stdin:1: syntax error: * unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: > unexpected
bc: stdin:1: syntax error: * unexpected

The reports themselves look pretty much the same.

dak180 commented 2 years ago

@danb35 Can you execute the script with bash -x to get the line numbers and the drive identifier that it's failing with?

danb35 commented 2 years ago

Sure thing. Running on the second system with script report.log bash -x report.sh -c report.cfg, it produced a 10 MB log file. All four errors occur within about 60 lines of each other, and all appear to deal with ada0, a Sandisk SSD. Here's the smartctl output you'd requested above on that drive.

sandisk-smartlog.txt

danb35 commented 2 years ago

The first system gives the divide by zero error with respect to its ssd, which I gave the smartctl output for previously (ssd-smartlog.txt above). The surrounding lines in the log file are:

++ bc
++ sed -e 's:^\.:0.:'
+ local totalBW=0.0
++ bc -l
+ ((  0  ))
++ bc -l
+ ((  0  ))
+ local totalBWColor=#ffffff
+ '[' 0.0 = 0.0 ']'
+ totalBW=N/A
++ bc
++ sed -e 's:^\.:0.:'
dc: divide by zero
+ local bwPerDay=0.0
+ '[' 0.0 = 0.0 ']'
+ bwPerDay=N/A
+ '[' 0 -gt 5 ']'
dak180 commented 2 years ago

@danb35 for the sandisk ssd can you post the output of smartctl -AHijl selftest --log="devstat" and smartctl -AHil selftest --log="devstat" as two separate files? in your last the plain text version truncated.

dak180 commented 2 years ago

@danb35 ping?

danb35 commented 2 years ago

Sorry, things have been pretty busy with the holidays. I'll get back to it in a day or two.

dak180 commented 2 years ago

@danb35 also please test agian with the latest commit.

danb35 commented 2 years ago

With the latest commit, the script runs on both systems without reporting any errors on the CLI. The generated reports don't appear to have changed noticeably from a week ago--they're clean, without dangling HTML tags and such. The issues noted above with "Seek Error Health" haven't changed, but understand those relate to weird reporting from the drives themselves.

dak180 commented 2 years ago

Fixed in 8bd7c14a28ee3cc6fe9b84a3849ed27260e3fc05.