Closed anjannath closed 3 months ago
Was able to deploy the lvms operator following the instructions from https://docs.openshift.com/container-platform/4.15/storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.html
for testing this:
currently in the OCP preset, the root partition has 12G of free space:
/dev/vda4 31G 20G 12G 64% /sysroot
we can shrink the root partition by few gbs and create another partition out of the remaining free space, suppose /dev/vda4
of 26G
and /dev/vda5
of 6G
which we can set as the device/partition to be used by the lvms operator
with the partition created, we can apply the following manifests to deploy the lvms operator:
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: my-lvmcluster
namespace: openshift-storage
spec:
storage:
deviceClasses:
- name: vg1
fstype: xfs
default: true
deviceSelector:
paths:
- /dev/vda5
forceWipeDevicesAndDestroyAllData: true
thinPoolConfig:
name: thin-pool-1
sizePercent: 90
overprovisionRatio: 10
---
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/cluster-monitoring: "true"
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged
name: openshift-storage
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-storage-operatorgroup
namespace: openshift-storage
spec:
targetNamespaces:
- openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: lvms
namespace: openshift-storage
spec:
installPlanApproval: Automatic
name: lvms-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
and to test it's working can test by applying:
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
containers:
- image: httpd
name: testpod
securityContext:
capabilities:
drop:
- ALL
runAsUser: 1001
allowPrivilegeEscalation: false
volumeMounts:
- name: testtopo
mountPath: /data
volumes:
- name: testtopo
persistentVolumeClaim:
claimName: lvm-file-1
securityContext:
runAsNonRoot: true
seccompProfile:
type: "RuntimeDefault"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lvm-file-1
namespace: default
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: lvms-vg1
resource consumption wise, deploying the lvms operator takes ~450mb of more RAM (some of it will be recovered after removing the hostpath-provisioner)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2351m (40%) 0 (0%)
memory 7857Mi (51%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2390m (41%) 0 (0%)
memory 8337Mi (55%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
Other thing we need to keep in mind as of now everything (images/PVs ..etc) part of root partition for OCP bundles so when user use bigger disk size they don't care about if that extra size is used for images or for PV. But this is not the case with microshift bundle where we are using the topolvm and user have to think before expanding the disk if that extra space is going to used by PVs or images. Current hostpath-provisioner
is not going to deprecated since it is extensively used in kubevirt side. What we want to achieve by changing it?
Other thing we need to keep in mind as of now everything (images/PVs ..etc) part of root partition for OCP bundles so when user use bigger disk size they don't care about if that extra size is used for images or for PV. But this is not the case with microshift bundle where we are using the topolvm and user have to think before expanding the disk if that extra space is going to used by PVs or images
if we add topolvm for openshift also, it'll be same for both the preset, which should be good i think
Current hostpath-provisioner is not going to deprecated since it is extensively used in kubevirt side. What we want to achieve by changing it?
the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.
the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.
@anjannath do we have some issue where user asked for those features or we think they will ask in future?
the hostpath-provisioner doesn't support resize and limits, this is the main reason for the switch i think it'd be good to give users the ability to experiment with these features.
@anjannath do we have some issue where user asked for those features or we think they will ask in future?
there were questions about the size of the PV and how to have smaller PVs on our slack channel, but i haven't seen github issues for it no
there were questions about the size of the PV and how to have smaller PVs
With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?
With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?
no its a limitation of the hostpath-provisioner
since its just creating directories in the host it has no mechanism to ensure the size, it just takes as much free space is available. see: https://github.com/kubevirt/hostpath-provisioner/issues/164#issuecomment-1413830124
With hostpath-provisioner do we fix the size of the PV's? I thought user can define the required size and it would be created automatic?
no its a limitation of the
hostpath-provisioner
since its just creating directories in the host it has no mechanism to ensure the size, it just takes as much free space is available. see: kubevirt/hostpath-provisioner#164 (comment)
Thanks for sharing, so now we have to make decision around resource limit side since you mentioned it takes around ~450mb . As part of hostpath-provisioner we don't put a Mem/CPU request so even we remove it it will not minimize the resource limitation. Since for 4.15 we are already increasing around 1.5G resource I am not sure should we increase ~.5G more?
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
hostpath-provisioner csi-hostpathplugin-fs4s6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d3h
I've been trying to modify the partition table on the ocp bundle through ignition to create a separate partition for use by topolvm
, but it seems the disk gets re-partitioned in the second boot.. other idea that came up while talking to @gbraad was to use a second disk and no change the partition on the existing disk
using the following butane config the disks gets partitioned during first boot (during install) but after reboot it gets overwritten:
disks:
- device: /dev/vda
wipe_table: true
partitions:
- number: 1
label: BIOS-BOOT
size_mib: 1
start_mib: 0
type_guid: 21686148-6449-6E6F-744E-656564454649
- number: 2
size_mib: 127
start_mib: 0
label: EFI-SYSTEM
type_guid: C12A7328-F81F-11D2-BA4B-00A0C93EC93B
- number: 3
label: boot
size_mib: 384
start_mib: 0
- number: 4
label: root
size_mib: 24000
start_mib: 0
- number: 5
label: pv-storage
start_mib: 0
also tried to add a filesystems
block and then the VM doesn't even boot:
filesystems:
- device: /dev/disk/by-partlabel/BIOS-BOOT
wipe_filesystem: true
format: none
- device: /dev/disk/by-partlabel/EFI-SYSTEM
wipe_filesystem: true
format: vfat
- path: /boot
device: /dev/disk/by-partlabel/boot
format: ext4
wipe_filesystem: true
with_mount_unit: true
- path: /root
device: /dev/disk/by-partlabel/root
format: xfs
wipe_filesystem: true
with_mount_unit: true
- device: /dev/disk/by-partlabel/pv-storage
format: ext4
wipe_filesystem: true
working on doing this in crc
itself, made some progress that can be tested from this branch: https://github.com/anjannath/crc/tree/extradisk (only for macOS currently)
it is currently doing the following things:
crc start
) create a second disk image in the machine instance dir (named: crc-second-disk.img)crc start
once kube-apiserver is up, the topolvm operator group, subscription and the openshift-storage
namespace is createdLVMCluster
resource is created which creates the lvm based storage class that can be used in the pvc definitionsafter step 2 we have to wait ~2mins for the installation of the operator to succeed and only after that the LVMCluster
custom resource becomes available for use. therefore i think if we install the operator during the snc
phase and only create the LVMCluster
resource during crc start
that'd not increase the start time
2 minutes is very long, I agree that moving this logic to snc should help. Not clear why all of it can't be done in snc? If it doesn't work from ignition, this could still be done after the install is done as part of all the tweaks we are doing to the cluster?
yes, we could do all of it in snc, what we need is on the disk we need a separate partition to be used by the lvms/topolvm operator and when the re-partitioning attempt with ignition failed and the idea of using a second disk came up i focused on doing it in crc as to use a second disk we would need changes to the libmachine drivers code.
but since now you mention it, if we don't use a second disk, and since we can modify the one disk image using guestfs tools we can do this entirely in snc..
For what it's worth, there is this enhancement open against microshift: https://github.com/openshift/enhancements/pull/1601 « MicroShift: Replacing upstream TopoLVM with a minified version of LVMS »
From what i understand going through that enhancement doc is microshift is moving to use the the LVMS operator instead of the modified topolvm deployment that is there now
but what is not clear to me is that, the minified version of LVMS (called microLVMS in the doc) is going to be a separate thing or its a new feature in the LVMS operator itself and then microLVMS is going to be used in both openshift as well as microshift
shrink the root partition by some amount (from 31 GB to 26 GB so decrease by 5GB)
the root partition filesystem is xfs
and guestfish
needs the filesystem to be resized before the partition can be resized, i think doing everything on snc will not be possible.. as i couldn't find something equivalent to resize2fs
for xfs
filesystem
shrink the root partition by some amount (from 31 GB to 26 GB so decrease by 5GB)
the root partition filesystem is
xfs
andguestfish
needs the filesystem to be resized before the partition can be resized, i think doing everything on snc will not be possible.. as i couldn't find something equivalent toresize2fs
forxfs
filesystem
You can grow xfs filesystems, but you cannot shrink them.
to summaries we are going to use LVMS operator for the dynanic PV provisioning in CRC, for this we need to:
LVMCluster
resource during crc start
(https://github.com/crc-org/crc/issues/4097)