PartialVolume / shredos.x86_64

Shredos Disk Eraser 64 bit for all Intel 64 bit processors as well as processors from AMD and other vendors which make compatible 64 bit chips. ShredOS - Secure disk erasure/wipe
Other
1.41k stars 60 forks source link

NVMe Drives Stuck on Syncing in Dell Poweredge R7525 #218

Open mwilcox857 opened 7 months ago

mwilcox857 commented 7 months ago

I have 6 of these servers and they each contain 2 SSDs and 16 NVMe drives. I needed to do the nomodeset to get them to boot, but after that I was able to start the erasure on them. I had the USB set up to autonuke everything but the USB drive with a single pass erasure and they all started fine. However, 8 of the NVMe drives started to slow and I noticed their lights were no longer blinking. The status just said syncing and the speed just kept dropping, the other drives were fine. I didn't have as much time as I would've liked to troubleshoot, but I couldn't find a solution. I tried booting into the latest version on Parted Magic, but it would only boot into a black screen and I think I tried most of the video related troubleshoots. I was able to boot into nwipe in the Parted Magic Extras menu though and all of the drives started and ran at a decent speed, that is on version .34, I don't think that matters but maybe it does. I will be back there tomorrow to see if they finished, I started 3 of them, and I need to get another 3 going. I am also wondering if there was a log file generated? I couldn't specify one, and it's logged to stdout, but would there be a file somewhere after the fact? Maybe that's a question for the Parted Magic forum. In a nutshell, 3 systems were booted, all had the same behavior (not sure if it was 8 drives stuck on the other 2 servers though).

PartialVolume commented 7 months ago

If the drive isn't blinking and it just says syncing and not writing with the drive through put dropping, the drive itself is not writing, it could be a couple of things.

  1. The drive is faulty, how old are these drives? Obtain the smart data and check the health.
  2. The drives are getting too hot due to insufficient cooling?, monitor the drives temperature at the start and make-sure it doesn't reach the drives published spec.

If you want a log file you have to specifically select that option on the command line if running nwipe from the command line, i.e --logfile=nwipelog.txt. ShredOS handles that for you and automatically creates the logs, I don't know about parted magic.

I would also try a secure erase using one of the command line tools.

You don't mention which version of ShredOS you are running?

mwilcox857 commented 7 months ago

Sorry, I was using the latest version of ShredOS. I thought there might be a drive issue as well, but when I booted into nwipe through Parted Magic they all ran at the same speed, so I wasn't sure what that meant.

I don't know why but the power strip I was using had the fuse pop on it and when I returned today all 3 servers were off. I had even less time today to work with so I just booted the first one back into nwipe from the Parted Magic Extras menu again and assumed that it was powered on long enough to wipe, so I just ran verify zeros on all of the drives and it was successful. The other 2 I was able to boot into Parted Magic ISO from 2020, the last version that supported 32 bit, and that booted up without the black screen issue. I have no idea what that means. I could've done a Secure Erase there, but for time I just ran nwipe to get the logs, but I also forgot that the older nwipe version only captured the first drives serial number. I won't have a very good log for the wipe, but I'll have to take what I can get.

I feel like I haven't even presented an issue here and just rambled, I apologize. I guess I was just curious about why ShredOS was hanging up on some of the drives but nwipe worked. I will be back there next week and hope to test some more things out. If I can replicate the problem on the next 3 servers, is there anything I can do to help you troubleshoot?

Thank you for the quick response!

Blue-Code252 commented 4 months ago

Hi there, I'm not sure but I think I have a similar problem to you. The difference is that I am erasing the SSD (NVMe drive) from a Latitude 5520-30. I want to do an ATA secure erase as I've been told it's the best you can find that won't completely destroy the lifespan of the SSD. However, I'm having trouble selecting ATA. I thought ATA is similar to DoD Short or something.

Well, I couldn't find the ATA protocol, so I tried it with the normal standard DoD Short, leaving the sync as default (on).

The result is that the clearing only made it to about 30% in the sync stage. alther after that the screen froze and I couldn't interact anymore.

I have tried this on several other laptops with the same problem.

I have attached a picture of the current decive I am clearing.

picture

What am I doing wrong? :(

PartialVolume commented 4 months ago

What am I doing wrong? :(

Well, one thing that could be a big issue is how hot your NVME drive is. According to that snapshot it's running at 75 deg.C ! The drive will either stop responding or go real slow until the temperature is under control to avoid destroying itself. Take a look at the cooling on the system, are the fans working, filters clean. Find a way to cool that drive down and try again.

Screenshot_20240529_124359

PartialVolume commented 4 months ago

I want to do an ATA secure erase as I've been told it's the best you can find that won't completely destroy the lifespan of the SSD. However, I'm having trouble selecting ATA.

To do a secure erase with ShredOS you currently have to resort to the command line:

ALT F2 to switch to the 2nd virtual terminal ( ALT F1 to return when you are done) then..

nvme list
nvme format -s1 <device>

Note. You don't need to prefix with sudo when running on ShredOS

Google nvme-cli to find more help on those tools.

I thought ATA is similar to DoD Short or something.

No, Nwipe doesn't currently do a ATA secure erase. It does traditional block writes. ATA secure erase is just another name for the drives firmware erasing the drive rather than the computer's CPU controlling the erasure process.

Nwipe will eventually do a ATA secure erase and run a estimated progress readout but at the moment you would use nvme format from the command line in ShredOS.

Blue-Code252 commented 4 months ago

I've allready thought aubout that but it's not the temp, I had the same problem 30min ago with a Latitude 5530 and it showed me 30 dge.C and stopped at 1.5%, I used "Fill with ones" for that one.

Blue-Code252 commented 4 months ago

I want to do an ATA secure erase as I've been told it's the best you can find that won't completely destroy the lifespan of the SSD. However, I'm having trouble selecting ATA.

To do a secure erase with ShredOS you currently have to resort to the command line:

ALT F2 to switch to the 2nd virtual terminal ( ALT F1 to return when you are done) then..

nvme list
nvme format -s1 <device>

[Edited the above commands, you don't need to prefix with sudo when running on ShredOS]

Google nvme-cli to find more help on those tools.

I thought ATA is similar to DoD Short or something.

No, Nwipe doesn't currently do a ATA secure erase. It does traditional block writes. ATA secure erase is just another name for the drives firmware erasing the drive rather than the computer's CPU controlling the erasure process.

Nwipe will eventually do a ATA secure erase and run a estimated progress readout but at the moment you would use nvme format from the command line in ShredOS.

That is actually something that i was looking for. I couldn´t find how to get in to the system command line

Thank you for that. I'll let you know if this solves the problem.

PartialVolume commented 4 months ago

I've allready thought aubout that but it's not the temp, I had the same problem 30min ago with a Latitude 5530 and it showed me 30 dge.C and stopped at 1.5%, I used "Fill with ones" for that one.

Might be useful to run the following command in the second virtual terminal to determine how healthy the drive is and whether the drive is on it's last legs or not. The example below is from a healthy WD Blue SN570 1TB that has no issues.

smartctl -a /dev/nvme0n1
[sudo] password for nick: 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.5.0-35-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD Blue SN570 1TB
Serial Number:                      XXXXXXXXXXXX
Firmware Version:                   234100WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 4a49c5c186
Local Time is:                      Wed May 29 14:48:52 2024 BST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     4.20W    3.70W       -    0  0  0  0        0       0
 1 +     2.70W    2.30W       -    0  0  0  0        0       0
 2 +     1.90W    1.80W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     3900   11000
 4 -   0.0050W       -        -    4  4  4  4     5000   44000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    47%
Data Units Read:                    1,589,380,936 [813 TB]
Data Units Written:                 877,031,464 [449 TB]
Host Read Commands:                 54,148,135,702
Host Write Commands:                22,780,905,502
Controller Busy Time:               113,104
Power Cycles:                       1,392
Power On Hours:                     5,483
Unsafe Shutdowns:                   92
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
mwilcox857 commented 4 months ago

Interesting, I didn't know you could jump to the command line and issue Secure Erase commands with hdparm and nvme format. That will be a quick way to erase a drive and then verify it. Any plans on a 10% NIST compliant verification solution?

PartialVolume commented 4 months ago

I made an edit to those commands in the comments above:

I had prefixed the nvme & smartctl commands with sudo, however that's not necessary when running on ShredOS. I've updated the comments to reflect that.

PartialVolume commented 4 months ago

Interesting, I didn't know you could jump to the command line and issue Secure Erase commands with hdparm and nvme format. That will be a quick way to erase a drive and then verify it. Any plans on a 10% NIST compliant verification solution?

I've copied a comment I made in 2019 about NIST compliance and it's shown below. I'm not sure what you mean by 10% compliance?.

Certain things could be done to improve compliance such as bringing nvme ATA secure erasure into nwipe so that the erasure is reflected in the PDF certificate, as is already the case with standard block erase. Maybe even having a secure login so that your name goes on the certificate if it really was you that wiped the disc.

However, when I read about NIST compliance it's far more than just what program you use to erase, it's about your whole organisations methodology in regards to securing data. nist.sp.800-88r1.pdf

The NIST800-88 guidelines which can be found here Guidelines for Media Sanitization for those that are interested in reading them.

NIST800-88 is more than just a particular way a program writes patterns over the disc such as DoD 5220.22, it's really guidelines for a company or organisation, that helps them develop their own methodology based on the NIST800-88 guidelines. NIST800-88 documents not just the software requirements that performs the wipe but the responsibilities within an organisation in regards to destroying data on various bits of equipment, not just computer discs. That's not to say there aren't specific requirements of the software that performs the disc erasure as well. For instance, the guidelines minimum requirements for a standard spinning disc is a single wipe of zeros with verification. Multiple wipes are optional.

Currently we haven't yet implemented some of the requirements, HPA+DSO, or secure erase however these are in the pipeline. HPA+DSO+secure erase can all be implemented by hdparm for those that want to implement those checks in their wipe procedure prior to running nwipe. Open source software is OK to use as stated in the NIST guidelines however an organisation should verify that that the software does in fact do what it says it does, whether opensource or proprietary.

Anybody that wants to wipe to NIST800-88 standards, by creating their own data destruction methodolgy using nwipe, ShredOS and hdparm, I would recommend you read the entire document above.

Blue-Code252 commented 4 months ago

I'm getting a web error that it couldn't find the pdf...

nist.sp.800-88r1.pdf

Blue-Code252 commented 4 months ago

oh ist blocked on my side for some reason....

Well thats an Independent issue.

mwilcox857 commented 4 months ago

I work for a NAID AAA certified company and that usually references the NIST 800-88r1 document you linked to. I was referring to being compliant with the NIST standards, sorry for the confusion.

Page 20 describes how to verify. My understanding is that the 10% verification would read the beginning and end of the drive as well as random spots adding up to 10% of the drive space. We usually do 100% verification, but I've been experimenting with 10%. It's easy to run a verification tool long enough to cover 10%, but they usually aren't reading the beginning and the end along with different parts in the middle. Just curious if that type of verification was anything that was on your radar.

Blue-Code252 commented 4 months ago

@PartialVolume Is it normal that the ext/nwipe/nvme/ folder is empty? Do i have to fill it myself or is this a problem?

I tried to do an nvme-cli lookup on the command line, but it just told me that it couldn't find the command... picture2

Do i have to do some custom suff so that it works?

I used the normal install as recormmend .img for USB. I flashed it with Rufus.

PartialVolume commented 4 months ago

Just curious if that type of verification was anything that was on your radar.

Yes, 10% verification would be a good alternative to add. I'll put it on the project list. https://github.com/users/PartialVolume/projects/1

PartialVolume commented 4 months ago

I tried to do an nvme-cli lookup on the command line, but it just told me that it couldn't find the command...

You don't need to prefix with sudo, just type the command nvme list. sudo isn't required for any commands when running ShredOS. Sorry for the confusion, I've updated the earlier comments.

Blue-Code252 commented 4 months ago

Sorry for the confusion, I've updated the earlier comments.

Aha I see now. Don't worry about that I'm a noob and not everything is perfect Anyway I'll try that tomorrow. Big thanks for the support.

Nebuli1 commented 4 months ago

Ciekawi mnie tylko, czy tego rodzaju weryfikacja była czymś, co miałeś na radarze.

Tak, weryfikacja 10% byłaby dobrą alternatywą do dodania. Umieszczę to na liście projektów. https://github.com/users/PartialVolume/projects/1

Perhaps the % could be determined by itself. ? The GPT header is in LBA 1 and its copy in the last LBA. So if both have been overwritten with zeros I know that the cleanup has succeeded hypothetically to the end and for my cases this is often enough 1%.

What do you guys think about reading 10 regions in the drive at 0.1% of the contents + 0.1% of the end of the drive which will give enough guarantee that the zeroing has almost certainly written data to the entire drive with zeros?