LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
954 stars 76 forks source link

One of resources have incorrect size #259

Closed Mezar303 closed 2 years ago

Mezar303 commented 2 years ago

Hi, I have a proxmox cluster with Linstor DRBD storage recently i one of 3 node I change disks from HDD to SSD on PVE 01.

After SyncTarget I noticed that one of my resource is significantly lower on PVE 01 than others nodes. How can I rsync this resource again from primary to pve01?

linstor resource lv | grep vm-101-disk-3 

| pve01 | vm-101-disk-3 | drbdpool    |     0 |    1016 | /dev/drbd1016 | 1016.03 MiB | Unused |           UpToDate |
| pve02 | vm-101-disk-3 | drbdpool    |     0 |    1016 | /dev/drbd1016 |  320.07 GiB | InUse  |           UpToDate |
| pve03 | vm-101-disk-3 | drbdpool    |     0 |    1016 | /dev/drbd1016 |  320.07 GiB | Unused |           UpToDate
rp- commented 2 years ago

You can verify the data and if there are out of sync blocks can invalidate afterwards: On node pve01: drbdadm verify vm-101-disk-3 wait for verify to finish (check dmesg) drbdadm invalidate --reset-bitmap=no vm-101-disk-3

Mezar303 commented 2 years ago

Thank you for your reply. I've decided invalidate drbdadm invalidate vm-101-disk-3 that resource on node pve01. Sync from primary starts right away what was grat :).

But, I have the same situation with resource vm-105-disk-1

| pve01 | vm-105-disk-1 | drbdpool    |     0 |    1010 | /dev/drbd1010 |  30.57 GiB | Unused |           UpToDate |
| pve02 | vm-105-disk-1 | drbdpool    |     0 |    1010 | /dev/drbd1010 |  32.01 GiB | Unused |           UpToDate |
| pve03 | vm-105-disk-1 | drbdpool    |     0 |    1010 | /dev/drbd1010 |  32.01 GiB | InUse  |           UpToDate |

On node pve01 drbdadm verify vm-105-disk-1 returned an error

vm-105-disk-1: State change failed: (-14) Need a verify algorithm to start online verify
Command 'drbdsetup verify vm-105-disk-1 1 0' terminated with exit code 11

I wonder if it's not better to invalidate resource without verify and just sync from primary. What do you think?

rp- commented 2 years ago

Then you have a linstor verison < 1.12.0 because, starting with 1.12.0 linstor would pick a matching verify algorithm for you. But you can also set one via the drbd-options. Also what drbd version do you use(cat /proc/drbd)? Maybe this is all a result from an old bug...

And yes you can also just invalidate and resync the whole resource

Mezar303 commented 2 years ago

I have linstor 1.11.1.

cat /proc/drbd 
version: 9.0.28-1 (api:2/proto:86-119)
GIT-hash: 8db03a6344e74e5c160294d80188dc31b785db61 build by root@pve01, 2021-03-12 15:11:05
Transports (api:16): tcp (9.0.28-1)

Another interesting case on node pve02 drbdtop shows:

Name          | Role      | Disks | Peer Disks | Connections | Overall | Quorum │
│ vm-999-disk-1 | Secondary | ✓     | ✗ (4)      | ✓           | ✗ (4)   | ✓ 

And

drbdsetup status --verbose --statistics vm-999-disk-1
vm-999-disk-1 node-id:1 role:Secondary suspended:no
    write-ordering:flush
  volume:0 minor:1000 disk:UpToDate quorum:yes
      size:10487576 read:1950308832 written:1745940 al-writes:56 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no
  pve01 node-id:0 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:13720 sent:10503360 out-of-sync:128 pending:0 unacked:0
  pve03 node-id:2 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
        received:1317856 sent:2572 out-of-sync:0 pending:0 unacked:0

What can I do with out-of-sync:128 beside invalidate entire resource and resync?

rp- commented 2 years ago

What can I do with out-of-sync:128 beside invalidate entire resource and resync?

sorry for the late reply, well exactly that. I think newer DRBD versions learned how you can just resync the OOS blocks, you may need to check docs/release notes there.