Spearfoot / disk-burnin-and-testing

Shell script for burn-in and testing of new or re-purposed drives
Other
853 stars 106 forks source link

Use of badblocks "-c" flag #14

Closed InfernoZeus closed 3 years ago

InfernoZeus commented 3 years ago

I saw the note about the long testing times, and looked up expected times for badblocks on the disks I'm using (4TB). I found this useful answer on superuser, which mentioned that adjusting the value used for the "-c" flag made a big difference to the speed:

badblocks -svn /dev/sdb To get to 1%: 1 Hour To get to 10%: 8 hours 40 minutes

badblocks -svn -b 512 -c 32768 /dev/sda To get to 1%: 35 Minutes To get to 10%: 4 hours 10 minutes

badblocks -svn -b 512 -c 65536 /dev/sda To get to 1%: 16 Minutes To get to 10%: 2 hours 35 minutes

I naturally wondered if there's a downside to setting a higher "-c" value. Another helpful answer mentioned this:

The -c option corresponds to how many blocks should be checked at once. Batch reading/writing, basically. This option does not affect the integrity of your results, but it does affect the speed at which badblocks runs. badblocks will (optionally) write, then read, buffer, check, repeat for every N blocks as specified by -c. If -c is set too low, this will make your badblocks runs take much longer than ordinary, as queueing and processing a separate IO request incurs overhead, and the disk might also impose additional overhead per-request. If -c is set too high, badblocks might run out of memory. If this happens, badblocks will fail fairly quickly after it starts. Additional considerations here include parallel badblocks runs: if you're running badblocks against multiple partitions on the same disk (bad idea), or against multiple disks over the same IO channel, you'll probably want to tune -c to something sensibly high given the memory available to badblocks so that the parallel runs don't fight for IO bandwidth and can parallelize in a sane way.

I'm currently testing 6x 4TB disks and my memory use is under 300M, so that doesn't seem to be much of an issue. Is there another reason this option isn't used by the script?

Spearfoot commented 3 years ago

Hello, and thanks for the interesting information.

No, there is no particular reason I never tweaked the block count with the -c option.

Why don't you try a larger value and report your results? We may be able to modify the script, or at least document suggested settings for users to try, depending on their system RAM constraints and number of disks being tested.

InfernoZeus commented 3 years ago

Sure, I'll see about running a few tests. I'm currently 27 hours into the testing - almost halfway there! Once that's done, I'll try to run some shorter benchmarks. I'll probably limit it to a single write pattern, instead of the default 4 patterns (0xaa, 0x55, 0xff, 0x00), and make some estimates based on reaching a particular percentage, otherwise it'll take too long.

InfernoZeus commented 3 years ago

For reference, my test run against 6x 4TB disks just finished. Interestingly there's already some variance just in the runs against 6 disks all of the same model:

burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0HN4S97.log:+ Started burn-in: Wed Dec 30 09:31:00 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0HN4S97.log:+ Finished burn-in: Sat Jan  2 09:43:06 PM CET 2021
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0HN4Y0L.log:+ Started burn-in: Wed Dec 30 09:31:04 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0HN4Y0L.log:+ Finished burn-in: Sat Jan  2 09:40:07 PM CET 2021
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY99JX.log:+ Started burn-in: Wed Dec 30 09:31:19 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY99JX.log:+ Finished burn-in: Sat Jan  2 09:21:20 PM CET 2021
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY9AZY.log:+ Started burn-in: Wed Dec 30 09:31:14 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY9AZY.log:+ Finished burn-in: Sat Jan  2 08:29:31 PM CET 2021
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY9KEJ.log:+ Started burn-in: Wed Dec 30 09:30:52 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K0PY9KEJ.log:+ Finished burn-in: Sat Jan  2 09:13:22 PM CET 2021
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K6XDDE4Y.log:+ Started burn-in: Wed Dec 30 09:30:46 PM CET 2020
burnin-WDC_WD40EFRX-68N32N0_WD-WCC7K6XDDE4Y.log:+ Finished burn-in: Sat Jan  2 07:58:45 PM CET 2021

This works out to anywhere from 70h:27m:59s all the way up to 72h:12m:6s. Also of note - the one I started first by only 6 seconds finished way earlier than the rest.

Spearfoot commented 3 years ago

I've burned in over a hundred disks and have noticed the same thing for similar drive types; some are just 'faster' than others!

InfernoZeus commented 3 years ago

Testing with the fastest disk, I tried a few different values of c - starting at the default (64) and going up to 32768. This was limited to the first 10,000,000 blocks on the disk to reduce run time. This initial test suggests that there's not much to be gained:

# Elapsed time   CPU (%)     Real (MB)   Virtual (MB)
==> c-64/psrecord.log <==
     436.373        8.900        3.074        6.402
==> c-1024/psrecord.log <==
     436.368        6.900       10.539       13.902
==> c-4096/psrecord.log <==
     437.444        6.000       34.453       37.902
==> c-8192/psrecord.log <==
     436.394        4.000       66.504       69.902
==> c-16384/psrecord.log <==
     443.404        5.000      130.484      133.902
==> c-32768/psrecord.log <==
     441.380        4.000      258.508      261.902

Memory usage rises with each test, which is to be expected. Interestingly CPU load seems to be marginally lower with bigger numbers. Unfortunately, duration is pretty constant, and in fact seems to get worse with higher numbers.

InfernoZeus commented 3 years ago

Going to close this issue - none of my testing indicates that there's any improvement from changing the c value, and I've now added all my 4TB disks to my ZFS pool.

@Spearfoot thanks for the tool - very useful! :smiley: