Closed micah closed 9 years ago
Can you try to run it with -r
Also, what do you have in dmesg after the test? Is there any OS logged errors?
Thanks for the quick response Baruch!
A couple possibilities here:
/dev/sdd is apparently in use by the system; it's not safe to run badblocks!
and not run... I had failed that device out of the raid array, confirmed it wasn't in swap, mounted anywhere, part of any LV, or open device mapping... but then I figured out that I had to more than just fail it in the raid array, I needed to also remove it (mdadm --manage --remove /dev/md4 /dev/sdd1) and then badblocks would run.
So, its possible that diskscan failed because the device was opened? If so, then maybe a check should be added to diskscan to see if its open before allowing you to continue with the fix option? The other possibility is that it failed because it was an older version of diskscan!
Since we are talking about this - when you pass the -f option to have diskscan fix the problems, this is effectively writing to that block to force the drive to reallocate it, right? I'm just trying to get an idea of how destructive this option is (compared to a badblocks destructive write test), it seems like it might be isolated to that specific spot and shouldn't really cause any damage that hasn't already been done by the disk failure?
There was such an old bug, it should work with 0.17.
The recover works for correctable errors by reading and rewriting, for uncorrectable errors it will just write zeros on it. Currently it works on a 64k block but I want to make it more granuler for uncorrectable errors so we don't zero a block if it is readable.
I also need to add the part where I verify the partition is not in use somehow. It's another ticket I have logged for myself but haven't yet implemented.
It does seem to work fine in 0.17.
I got a failure message after it ran ("Conclusion: failed due to IO errors"), but I think that is because it re-mapped out a bad sector. Running it a second time didn't give me that message.
In that case it's fine now. You should note that sometimes the issue may return later, if that happens I wouldn't wait for it to happen a third time normally and just backup all the data and replace the drive. If it happens only once or only after a very long time (> 6 months) you can consider it a random behavior, if it happens faster than that you should assume (IMNSHO) that the disk is going to die at some point in the future and you most likely don't want to wait for it.
Hi,
I have two disks, on two different systems, that are reporting Current_Pending_Sector count of 1 in smart, so I decided to try your tool to fix it, but it seemed like it failed :(