LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Implement some overprovisioning protections at a storage-pool level #377

Closed dvance closed 1 week ago

dvance commented 1 year ago

The MaxOversubscriptionRatio works on a per-volume basis. This means that, with for example a storage-pool 1TiB in size and a ratio of 2x, Linstor would only allow us to create a volume <2TiB in size. However, you could still create 20 resources all of which are 1TiB in size.

Would it be possible for there to be some feature that prevents making an additional resource if the total aggregate size of all resource using that storage pool was to total more than 2x the storage pool size (2TiB in this example).

ghernadi commented 1 year ago

TL;DR: It's complicated :slightly_smiling_face:

Slightly longer overview of how it currently works and hightlighting some issues: Sure, having a 1TiB storage, setting an oversubscription ratio of 2.0 to allow 2TiB of volume reservation sounds easy enough. In theory at least. In practice things are a bit different, but let's investigate this scenario a bit further.

The previous statement is easy enough for an empty storage pool. However, things are getting complicated when considering partially (or almost completely) filled storage pools. For example, if we stick with out 1TiB storage pool, but assume that 500GiB are already reserved. Should LINSTOR still allow additional 1.5TiB of volumes? Sure, as long as we are only talking about thin reservations, why not, that is actually the point of overprovisioning, right?

The problem starts when we start to also consider actually used space, besides the reservations. With a 1TiB storage pool, if we already have for example 1.5TiB reserved, but actually 950GiB allocated (i.e. in use), one can quite easily argue that it would be a mistake if LINSTOR would allow another 500GiB volume, since LINSTOR knows that there are only 50GiB free, right?

That's the reason why LINSTOR takes the remaining free space of the storage pool (so in the above example the 50GiB) and applies the oversubscription ratio to only this free space, resulting in allowing another 100GiB volume instead of a 500GiB volume.

This might be better for the above described example, but I do agree that it does not really help with the initial scenario you described. Exactly because LINSTOR is now considering the remaining free space LINSOTR allows the 20 (and more) 1TiB resources as long as those resources do not allocate actual data.

I guess we could improve LINSTOR by implementing both approaches, the free space based as well as the reserved space based and simply take the lower of those two values as the capacity. That means that even with empty resources, LINSTOR would only allow up to 2 TiB of reservation in your example (i.e. only 2x 1TiB resources), and still behave similarly in the example I gave with the 1.5TiB reserved with 950GiB already allocated, allowing only 100GiB for a new resource.

Discussions / suggestions are welcome :slightly_smiling_face:

luissimas commented 2 weeks ago

Hello folks. It'd be really nice to have some sort of oversubscription behavior that takes into account the total reserved space in the storage pool. From what I've seen this is still not possible currently, right? Are there any plans to implement such feature?

I think the behavior described by @ghernadi makes a lot of sense, but I'm not sure about what's the best way to present this to the user as a set of config options.

As an example of the desired behavior, I have a cluster with 3 satellites with an LVM-Thin storage pool of 148.71 GiB on each node. Let's say we have a MaxReservedSpaceOversubscriptionRatio property that allows us to limit the size of new volumes by taking the total reserved space of the storage pool into account. I'll set the value of this property to 2, which means that Linstor should not allow the total reserved size of the storage pool to exceed two times its capacity.

I'm using a placement count of 3 to simplify the example. When I spawn a new resource with 100 GiB on the default resource group, I'd expect the MaxVolumeSize of the resource group to decrease by the size of the newly created volume. In other words, I'd expect the MaxVolumeSize of the resource group to be given by: storagePoolTotalCapacity * oversubscriptionRatio - storagePoolTotalReservedSpace

ubuntu@host:~$ linstor controller set-property MaxReservedSpaceOversubscriptionRatio 2
SUCCESS:
    Successfully set property 'MaxReservedSpaceOversubscriptionRatio' to value '2'
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.42 GiB ┊    148.71 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg spawn-resources DfltRscGrp test-resource 100G
...
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    197.42 GiB ┊    148.70 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯

At least this is a rough idea of what I had in mind. Is it feasible to implement such behavior?

ghernadi commented 2 weeks ago

Hello,

My apologizes that I have forgotten to update this issue. Your suggestion is not just feasible but also kinda was implemented with v1.26.0.

The idea was that the property MaxOversubscriptionRatio now only acts as a default for two new properties, MaxFreeCapacityOversubscriptionRatio and MaxTotalCapacityOversubscriptionRatio. That means, if the two new properties are not set, the both inherit the value of MaxOversubscriptionRatio. If that is not set either, it's default value is 20.0.

Here is some brief demonstration about the behavior and usage:

Simple setup (sorry for my tiny test-cluster storage pools :slightly_smiling_face: )

$ linstor sp l
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node    ┊ Driver   ┊ PoolName             ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                   ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ bravo   ┊ DISKLESS ┊                      ┊              ┊               ┊ False        ┊ Ok    ┊ bravo;DfltDisklessStorPool   ┊
┊ DfltDisklessStorPool ┊ charlie ┊ DISKLESS ┊                      ┊              ┊               ┊ False        ┊ Ok    ┊ charlie;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ delta   ┊ DISKLESS ┊                      ┊              ┊               ┊ False        ┊ Ok    ┊ delta;DfltDisklessStorPool   ┊
┊ lvmthinpool          ┊ bravo   ┊ LVM_THIN ┊ scratch/linstor-thin ┊        1 GiB ┊         1 GiB ┊ True         ┊ Ok    ┊ bravo;lvmthinpool            ┊
┊ lvmthinpool          ┊ charlie ┊ LVM_THIN ┊ scratch/linstor-thin ┊        1 GiB ┊         1 GiB ┊ True         ┊ Ok    ┊ charlie;lvmthinpool          ┊
┊ lvmthinpool          ┊ delta   ┊ LVM_THIN ┊ scratch/linstor-thin ┊        1 GiB ┊         1 GiB ┊ True         ┊ Ok    ┊ delta;lvmthinpool            ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ linstor rg l
╭──────────────────────────────────────────────────────╮
┊ ResourceGroup ┊ SelectFilter  ┊ VlmNrs ┊ Description ┊
╞══════════════════════════════════════════════════════╡
┊ DfltRscGrp    ┊ PlaceCount: 3 ┊        ┊             ┊
╰──────────────────────────────────────────────────────╯
$ linstor rg qsi dfltrscgrp
╭───────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊ Capacity ┊ Next Spawn Result      ┊
╞═══════════════════════════════════════════════════════════════════╡
┊        20 GiB ┊         1 GiB ┊    1 GiB ┊ lvmthinpool on bravo   ┊
┊               ┊               ┊          ┊ lvmthinpool on charlie ┊
┊               ┊               ┊          ┊ lvmthinpool on delta   ┊
╰───────────────────────────────────────────────────────────────────╯
$ linstor sp sp -h | grep -i Oversub
  key         'MaxOversubscriptionRatio': Default value for MaxFreeCapacityOversubscriptionRatio and MaxTotalCapacityOversubscriptionRatio (default 20)
              'MaxFreeCapacityOversubscriptionRatio': Maximum allowed ratio the remaining free space can be overprovisioned (default 20)
              'MaxTotalCapacityOversubscriptionRatio': Maximum allowed ratio the capacity can be overprovisioned (default 20)

As before, setting the general MaxOversubscriptionRatio property has a similar affect at first as previously:

$ linstor c sp MaxOversubscriptionRatio 2.0
SUCCESS:
    Successfully set property 'MaxOversubscriptionRatio' to value '2.0'
$ linstor rg qsi dfltrscgrp
╭───────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊ Capacity ┊ Next Spawn Result      ┊
╞═══════════════════════════════════════════════════════════════════╡
┊         2 GiB ┊         1 GiB ┊    1 GiB ┊ lvmthinpool on bravo   ┊
┊               ┊               ┊          ┊ lvmthinpool on charlie ┊
┊               ┊               ┊          ┊ lvmthinpool on delta   ┊
╰───────────────────────────────────────────────────────────────────╯

But things are different now as soon as you actually spawn a (thin-) resource:

$ linstor rg spawn dfltrscgrp rsc1 100m
...
$ linstor rg qsi dfltrscgrp
╭───────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊ Capacity ┊ Next Spawn Result      ┊
╞═══════════════════════════════════════════════════════════════════╡
┊      1.90 GiB ┊   1023.90 MiB ┊    1 GiB ┊ lvmthinpool on bravo   ┊
┊               ┊               ┊          ┊ lvmthinpool on charlie ┊
┊               ┊               ┊          ┊ lvmthinpool on delta   ┊
╰───────────────────────────────────────────────────────────────────╯

In short:

That means, LINSTOR is simply taking the lower of the two values. To further demonstrate this, without adding a new resource but actually using more of the existing resource, the result of qsi changes:

$ ssh bravo dd if=/dev/urandom of=/dev/drbd1000 bs=1M count=90 oflag=direct
...
$ linstor rg qsi dfltrscgrp
╭───────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊ Capacity ┊ Next Spawn Result      ┊
╞═══════════════════════════════════════════════════════════════════╡
┊      1.82 GiB ┊    933.89 MiB ┊    1 GiB ┊ lvmthinpool on bravo   ┊
┊               ┊               ┊          ┊ lvmthinpool on charlie ┊
┊               ┊               ┊          ┊ lvmthinpool on delta   ┊
╰───────────────────────────────────────────────────────────────────╯

Using the same calculations as before, we should be able to figure out the numbers:

Again, since the smaller is taken, therefore 1.82 GiB is shown.

Feel free to play with this properties and let us know if this helps you.

luissimas commented 2 weeks ago

@ghernadi first of all thanks for the detailed response!

I'm aware of the three existing parameters for configuring the oversubscription ratio. For context, I'm experimenting with the possibilities presented in the "Over Provisioning Storage in LINSTOR" section of the Linstor guide.

From what I understood the key point here is that Linstor uses the used space of the volumes instead of their size as the basis for the oversubscription calculations.

In the following example, I have both MaxFreeCapacityOversubscriptionRatio and MaxTotalCapacityOversubscriptionRatio unset, and MaxOversubscriptionRatio has a value of 2.

When I spawn a resource, I'd expect (with the feature proposed in this issue) the new MaxVolumeSize to be 197.42 (Capacity * MaxOversubscriptionRatio - sizeOfResource(test)). Instead, today it seems that Linstor actually calculates the MaxVolumeSize as Capacity * MaxOversubscriptionRatio - thinlyUsedSpace(test).

This is consistent with the documentation and seems to be the correct behavior. Since both MaxFreeCapacityOversubscriptionRatio and MaxTotalCapacityOversubscriptionRatio inherit the value 2 from MaxOversubscriptionRatio, and the thin volume uses only a few MiB while its empty, the MaxVolumeSize is reduced by that amount. If I set the MaxFreeCapacityOversubscriptionRatio value back to 20, the reported MaxVolumeSize goes back to Capacity * MaxOversubscriptionRatio.

ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.42 GiB ┊    148.71 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg spawn DfltRscGrp test 100G
...
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.39 GiB ┊    148.70 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor controller set-property MaxFreeCapacityOversubscriptionRatio 20
SUCCESS:
    Successfully set property 'MaxFreeCapacityOversubscriptionRatio' to value '20'
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.40 GiB ┊    148.70 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯

The main problem with the current behavior is that we can create, for example, 10 volumes of 100GiB in a storage pool with 148.71GiB capacity and a MaxOversubscriptionRatio of 2, as demonstrated by the following example:

ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.42 GiB ┊    148.71 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ for n in {1..10}; do linstor rg spawn DfltRscGrp test-$n 100G >/dev/null; done
ubuntu@host:~$ linstor vd l
╭─────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ VolumeNr ┊ VolumeMinor ┊ Size    ┊ Gross ┊ State ┊
╞═════════════════════════════════════════════════════════════════╡
┊ test-1       ┊ 0        ┊ 1000        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-2       ┊ 0        ┊ 1001        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-3       ┊ 0        ┊ 1002        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-4       ┊ 0        ┊ 1003        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-5       ┊ 0        ┊ 1004        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-6       ┊ 0        ┊ 1005        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-7       ┊ 0        ┊ 1006        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-8       ┊ 0        ┊ 1007        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-9       ┊ 0        ┊ 1008        ┊ 100 GiB ┊       ┊ ok    ┊
┊ test-10      ┊ 0        ┊ 1009        ┊ 100 GiB ┊       ┊ ok    ┊
╰─────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.01 GiB ┊    148.50 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯

I'd expect that Linstor could provide an oversubscription behavior that takes the total size of volumes into consideration (instead of the thinly used space). This would allow users to prevent scenarios such as the described above.

kermat commented 1 week ago

Hello @luissimas :wave:

Just to summarize and clarify, you are suggesting a third over-subscription property, possibly named MaxReservedCapacityOversubscriptionRatio, that considers reserved space rather than used space when calculating MaxVolumeSize, correct?

luissimas commented 1 week ago

Hello @kermat!

Yes, that's exactly the suggestion. This would give users better control over the oversubscription in scenarios in which they expect a lot of volumes to be created that will not necessarily be consumed at the same rate.

Borrowing the MaxReservedCapacityOversubscriptionRatio name for the suggested property. Here's an example of the desired behavior:

ubuntu@host:~$ linstor controller set-property MaxReservedCapacityOversubscriptionRatio 2
SUCCESS:
    Successfully set property 'MaxReservedCapacityOversubscriptionRatio' to value '2'
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    297.42 GiB ┊    148.71 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg spawn-resources DfltRscGrp test-1 100G
SUCCESS:
...
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊    197.42 GiB ┊    148.70 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg spawn-resources DfltRscGrp test-2 100G
SUCCESS:
...
ubuntu@host:~$ linstor rg query-size-info DfltRscGrp
╭────────────────────────────────────────────────────────────────╮
┊ MaxVolumeSize ┊ AvailableSize ┊   Capacity ┊ Next Spawn Result ┊
╞════════════════════════════════════════════════════════════════╡
┊     97.42 GiB ┊    148.70 GiB ┊ 148.71 GiB ┊ vpool on node-1   ┊
┊               ┊               ┊            ┊ vpool on node-2   ┊
┊               ┊               ┊            ┊ vpool on node-3   ┊
╰────────────────────────────────────────────────────────────────╯
ubuntu@host:~$ linstor rg spawn-resources DfltRscGrp test-3 100G
ERROR:
Description:
    Not enough available nodes
...
ghernadi commented 1 week ago

What version of LINSTOR are you using?

luissimas commented 1 week ago

Sorry folks, I was using version 1.27.1. I updated all nodes to version 1.29.2 and the behavior of the MaxTotalCapacityOversubscriptionRatio seems to be exactly what I described in the comment above :sweat_smile:. @ghernadi I believe this was the result of your fix in 909e1af006355a3b12fac0d3c10808b0f46df19d, right?

Was this always the intended behavior of the MaxTotalCapacityOversubscriptionRatio? While reading its description in the docs I got the impression that it would only consider the total capacity of the storage pool, and nothing else.

In any case, I think this is the behavior I was looking for as well as the one described in the original issue. Thanks a lot for the patience and support, I'm quite happy with this oversubscription behavior.

ghernadi commented 1 week ago

Sorry folks, I was using version 1.27.1. I updated all nodes to version 1.29.2 and the behavior of the MaxTotalCapacityOversubscriptionRatio seems to be exactly what I described in the comment above 😅. @ghernadi I believe this was the result of your fix in 909e1af, right?

Yes, I had this very comment in mind when I asked you for the version :)

Was this always the intended behavior of the MaxTotalCapacityOversubscriptionRatio?

For me it was :sweat_smile:

While reading its description in the docs I got the impression that it would only consider the total capacity of the storage pool, and nothing else.

Noted, we will recheck the docs and see if we can clarify this, thanks for letting us know!

In any case, I think this is the behavior I was looking for as well as the one described in the original issue. Thanks a lot for the patience and support, I'm quite happy with this oversubscription behavior.

:tada: