reinstate persistent_disk

itsouvalas commented 1 year ago

Using disk_type instead of persistent_disk although allows the selection of the disk size from cloud config, it doesn't declare the required attribute of the persistent nature of the disk. As a result, the root volume is used for the predefined /var/vcap/store which is meant to be mounted to the persistent disk leading it in that way to run out of space.:

Filesystem      Size  Used Avail Use% Mounted on
udev            967M     0  967M   0% /dev
tmpfs           199M   27M  172M  14% /run
/dev/sda1       2.9G  2.8G     0 100% /
tmpfs           992M     0  992M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           992M     0  992M   0% /sys/fs/cgroup
/dev/sdb2       7.9G  649M  6.8G   9% /var/vcap/data
tmpfs            16M  268K   16M   2% /var/vcap/data/sys/run
tmpfs           199M     0  199M   0% /run/user/1004

reinstating persistent_disk adds an additional volume which is rightfully mounted under /var/vcap/store. This time around the root / volume remains at reasonable usage level. This is easier to be noticed on vmare's CPI where the root volume is 3GB. :

blacksmith/472f82be-26a3-4d47-b855-3a2bf62d56b9:/# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            967M     0  967M   0% /dev
tmpfs           199M  6.3M  193M   4% /run
/dev/sda1       2.9G  1.7G  1.2G  60% /
tmpfs           992M     0  992M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           992M     0  992M   0% /sys/fs/cgroup
/dev/sdb2       7.9G  639M  6.8G   9% /var/vcap/data
tmpfs            16M  268K   16M   2% /var/vcap/data/sys/run
/dev/sdc1        20G  1.3G   18G   7% /var/vcap/store
tmpfs           199M     0  199M   0% /run/user/1002

dennisjbell commented 1 year ago

Do we know why it was removed in the first place? Is this a CPI issue where vsphere CPI behaves differently? Should this only be done on vsphere deployment targets?

itsouvalas commented 1 year ago

The persistent disks link provided doesn't differentiate between cpis, as in, the 'persistent_disk' key is applied throughout the Bosh deployments irrelevant to the CPI being used and is responsible for identifying a disk as persistent.

A note on that same link reiterates that:

If you terminate or delete a VM from your IaaS console, the fate of the persistent disk depends on the IaaS provider. For example, in AWS, the default behavior is to keep the persistent disk when you delete a VM.

That said, although I haven't tested it on AWS, the documentation suggests that, with the absence of persistent_disk, Bosh will consider it as ephemeral and subsequent deployments or a VM replacement by Bosh's health monitor, should essentially result in the volume being scrapped and repopulated by the stemcell alone.

As far as the "why" it was replaced in the first place, I believe that at the time the requester had mixed disk_type and it's ability to select a disk size from a cloud config, with the actual type of the disk, which in this case is meant to be persistent, aptly named persistent_disk.

wayneeseguin commented 1 year ago

The changes look good, this brings disk to be consistent with the rest.

genesis-community / blacksmith-genesis-kit

reinstate persistent_disk #68