Open fabbione opened 3 weeks ago
What do you mean by “leaking storage”?
Simple, the lv is resized, but not the drbd device. That means the VMs doesn´t see the storage but it is allocated in the lv/lvm. That storage is unavailable to anyone to use.
Ah, that is expected. There's a period of time where it's unavoidable that one LV is grown before the peer node is grown, and DRBD can't be grown until both are grown. If I've started a grow operation, I don't want that space to be available to others to use. The scan-lvm scan agent should see the reduced free space in the VG and drop the available space in the associated storage group.
That is NOT the issue. The issue is that lv is grown (correctly), second drbd resize fails, nothing is going to trigger another drbd resize to match the new lv size. Hence the space is lost.
Aaaah, ok, sorry I misunderstood.
ToDo:
Don't allow resize job to start until all nodes are online (no other way to ensure UpToDate on all DRBD nodes)
This is not a super common situation, but regardless it needs to be handled properly or storage is leaked during grow processes.
create a server, stop the server to resize root disk (this can happen on any disk, in my test i only had one disk).
Run for the first time: anvil-manage-server-storage --server an-test-deploy1 --grow 5G --disk vda --confirm .... Done!
wait for drbd resync to be completed <-- IMPORTANT. All good, you can issue again:
anvil-manage-server-storage --server an-test-deploy1 --grow 5G --disk vda --confirm .... Done!
and it will work as expected.
wait for drbd resync to be completed <-- IMPORTANT. All good, you can issue:
anvil-manage-server-storage --server an-test-deploy1 --grow 30G --disk vda --confirm ... Done!
and issue the same command IMMEDIATELY after:
This issue is caused by drbd resource refusing a resize one is already in flight. At this point we are leaking storage.
The lv has been resized, but drbd will not see it or recognize it.
Storage is leaked any time a drbd resize request fails, this is just one possible trigger.
For the grow operation specifically, either check drbd status BEFORE resizing the lv and exit 1 if in progress (avoid leaking) or a loop is necessary to wait for the first sync to complete before issuing the next resize.