007revad / Synology_M2_volume

Easily create an M.2 volume on Synology NAS
MIT License
834 stars 58 forks source link

Fail to repair RAID1 volume #51

Closed goobags closed 4 months ago

goobags commented 1 year ago

Hi,

I have used this script to set up a RAID1 array (mirrored). I used some old small M.2 drives as testers and have since ordered two bigger drives. I replaced one today, similar to how I have rebuilt RAID1 arrays in the past, just replace a drive the rebuild through the UI. The problem is I cannot get it to work.

Trying to repair the Storage Pool results in an error saying there are no drives that meet the requirements. Clicking the new drive under HDD/SSD doesn't;t let me do anything other than a SSD Cache (for obvious reasons I'm on a DS918+)

Re-running the script only lets me select one drive (the new one) and the script fails to finish.

Synology_M2_volume v1.2.14
DS918+ DSM 7.1.1-42962-5 

Using options: 
Type yes to continue. Type anything else to do a dry run test.
yes

NVMe M.2 nvme0n1 is Samsung SSD 970 EVO Plus 1TB
No existing partitions on drive

NVMe M.2 nvme1n1 is Samsung SSD 970 EVO Plus 250GB
Skipping drive as it is being used by DSM

Unused M.2 drives found: 1

1) nvme0n1
Select the M.2 drive: 1
You selected nvme0n1

Ready to create volume group on nvme0n1
Type yes to continue. Type anything else to quit.
yes
You chose to continue. You are brave! :)

Using md5 as it's the next available.

Creating Synology partitions on nvme0n1

        Device   Sectors (Version8: SupportRaid)
 /dev/nvme0n11   4980480 (2431 MB)
 /dev/nvme0n12   4194304 (2048 MB)
Reserved size:    260352 ( 127 MB)
Primary data partition will be created.

WARNING: This action will erase all data on '/dev/nvme0n1' and repart it, are you sure to continue? [y/N] y
Cleaning all partitions...
Creating sys partitions...
Creating primary data partition...
Please remember to mdadm and mkfs new partitions.

Creating single drive RAID.
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md5 started.

Creating a physical volume (PV) on md5 partition
  Physical volume "/dev/md5" successfully created

Creating a volume group (VG) on md5 partition
  /dev/vg5: already exists in filesystem
  Run `vgcreate --help' for more information.

ERROR 5 Failed to create volume group!
goobags commented 1 year ago

And I just put the original drive back in now it's crashed (the one drive not the entire RAID array) and cannot repair due to it being seen as SSD cache only in DSM UI.

007revad commented 1 year ago

I never considered that someone might try to replace an M2 drive to rebuild or expand the RAID. What you wanted to do might have been available from storage manager if you were using DSM 7.2 and you had run https://github.com/007revad/Synology_HDD_db.

Is one of the original small M.2 drives still showing as degraded, or has the whole array crashed?

goobags commented 1 year ago

Just one drive is crashed, the storage pool is still functional. Just trying to prevent having to reinstall a few apps and Docker back on an entirely new storage pool/volume.

007revad commented 1 year ago

I think backing up, then reinstalling a new storage pool/volume may be quickest solution.

It's going to take a while for me to work out how DSM does a RAID repair.

dantrauner commented 10 months ago

@007revad Thanks for all of the work you've done on your scripts – they're really useful!

I wanted to check in on this issue since I recently had a RAID1 M.2 volume using the official 10G NIC + M.2 adapter card lose a drive. Have you looked at all into allowing a blank disk to be added to an existing volume? If not, I'd be interested in helping get this working if you have some idea of where to start.

007revad commented 10 months ago

@dantrauner

First, make sure your data from the NVMe volume is backed up.

What does the following command return: sudo synostgpool --auto-repair -h

And this one: sudo synostgpool --misc --get-pool-info | jq

I only need the nvme section like this:

  {
    "device_type": "shr_without_disk_protect",
    "disks": [
      "nvme1n1"
    ],
    "id": "reuse_2",
    "is_writable": true,
    "num_id": 2,
    "pool_path": "reuse_2",
    "raids": [
      {
        "designedDiskCount": 1,
        "devices": [
          {
            "id": "nvme1n1",
            "slot": 0,
            "status": "normal"
          }
        ],
        "hasParity": false,
        "minDevSize": "493964574720",
        "normalDevCount": 1,
        "raidCrashedReason": 0,
        "raidPath": "/dev/md3",
        "raidStatus": 1,
        "spares": []
      }
    ],
    "size": {
      "total": "489118760960",
      "used": "488565112832"
    },
    "space_path": "/dev/vg2",
    "status": "normal",
    "summary_status": "normal"
  },
007revad commented 10 months ago

@dantrauner

Just now I was able to repair a NVMe RAID 1 storage pool from Storage Manager. For the steps I used to work I need to know a few things about your setup.

  1. What model Synology do you have?
  2. What DSM version is it running?
  3. Are the NVMe drives in internal M.2 slots or in a PCIe M.2 adaptor card? (like a M2D20, M2D18 or E10M20-T1)
dantrauner commented 10 months ago

Probably 60 seconds before your last reply, I decided to just use this opportunity to practice my DR procedure 😄 I'm bookmarking this and will try to repair next time, but:

  1. RS1221+
  2. DSM 7.2-64570 Update 3
  3. Using an E10M20-T1 card
007revad commented 10 months ago

For future reference I've created a few wiki pages documenting how I repaired my NVMe RAID 1 after replacing a drive.

Repair M.2 RAID 1 in internal M.2 slots

Repair M.2 RAID 1 in adaptor card - Requires the NAS has Internal M.2 slots.

Repair RAID via SSH - I have not tested this method yet...

kidhasmoxy commented 10 months ago

The following comes from a blog post on how to create the volume manually (which even cites your script @007revad.) It's a snippet to get the new nvme drive to show up as an option to repair the failed array. in this case, md3 is the group for your existing storage group and the nvme is referenced by /dev.

https://academy.pointtosource.com/synology/synology-ds920-nvme-m2-ssd-volume/

synopartition --part /dev/nvme1n1 12
mdadm --manage /dev/md3 -a /dev/nvme1n1p3