bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.39k stars 497 forks source link

Metal: Support iSCSI initiator and NFS client for EKS-A Bare Metal persistent storage #2570

Open Cajga opened 1 year ago

Cajga commented 1 year ago

What I'd like: We would like to use Bottlerocket for EKS-A Bare Metal in our DC. For persistent storage, we are going to use NetApp and their Astra Trident (formerly known as Trident) CSI driver solution. When a PVC is created in EKS-A, Astra Trident will create iSCSI or NFSv4 volumes on our NetApps clusters (and respective PVs in EKS-A) and bind these to the PVC. When a pod is using such a volume, the Astra Trident CSI pod (coming from a daemonset) will mount the NFS into the pod or in case of iSCSI would create an FS on the iSCSI volume and mount it into the pod.

In order to be able to do this, Astra Trident has some requirements for the Kubernetes worker node.

For NFS:

For iSCSI:

zmrow commented 1 year ago

HI @Cajga - thanks for the request!

We'll have to take a deeper look at this CSI driver and its requirements. It's interesting that the packages are required host side vs. in the CSI daemonset. Outside of the mentioned packages, we'll also need to figure out what other assumptions are made about the host that wouldn't be true for Bottlerocket, i.e. the existence of a shell, etc.

We don't currently include the nfs-utils or any of the iSCSI packages in Bottlerocket. Which are you targeting?

Cajga commented 1 year ago

Hi @zmrow,

Thanks for looking into this.

Well, we have use cases for both iSCSI and NFSv4 (we have been using NetAPP + Trident in EKS on Outpost). We have iSCSI as default storage class in our clusters (with xfs on top) which is used in about 95% of the cases. There are few cases when we need ReadWriteMany (RWX) volumes then we use the nfsv4 storage class. We would need to take a look if we could get rid of the RWX volumes to rely only on iSCSI.

For iSCSI, the host must have an initiator id which should not change. As I mentioned above, this is normally generated at install time of the iSCSI packages on standard Linux. Trident will look for this ID on the host to register each nodes in the NetApp cluster. Without this information the NetApps would not give iSCSI volumes to the host. I wonder if such ID would be in a right place inside the daemonset. Or, maybe the daemonset could generate it at first run and persist it somehow, somewhere into bottlerocket. Is there a different CSI driver which manage iSCSI volumes and work with Bottlerocket? I could open a support case with NetApp to take a look on their implementation...

Cajga commented 1 year ago

Maybe relevant and could help a bit in planning (id is not generated at install time): In RHEL9, the /etc/iscsi/initiatorname.iscsi file (which contains the id like InitiatorName=iqn.1994-05.com.redhat:a5a8dd56f673 ) gets created at the first start of the iscsid.service and not at install time.

zmrow commented 1 year ago

@Cajga do you know if there are other host requirements besides the nfs/iscsi packages mentioned?

Cajga commented 1 year ago

@zmrow well, let me collect what I am aware of:

I am happy to help to test in case you decide to give it a go.

wonderland commented 1 year ago

It might be worth noting that the Trident CSI driver is also used to integrate AWS FSx Ontap with EKS clusters: https://docs.aws.amazon.com/eks/latest/userguide/fsx-ontap.html

Same requirements apply.

Solving this will therefore not just be relevant for EKS-A bare metal, but also for EKS (with bottlerocket).

It's interesting that the packages are required host side vs. in the CSI daemonset.

Storage access is always from the host. The CSI driver orchestrates it but does not sit in the datapath.

datamattsson commented 1 year ago

Most CSI drivers that use iSCSI also rely on multipathd to be functional. The HPE CSI Driver for Kubernetes expect these commands to exist on the host (btrfs being optional):

blkid
blockdev
btrfs
dmidecode
dnsdomainname
find
fsck
ip
iscsiadm
lsblk
lsscsi
mkfs.btrfs
mkfs.ext3
mkfs.ext4
mkfs.xfs
mount
multipath
multipathd
resize2fs
sg_inq
umount
xfs_growfs
Cajga commented 1 year ago

The lack of the possibility to use persistent storage is one of the major limitations (and the only blocker in our case with EKS-A BM) of using bottlerocket on bare metal deployments.

Could you please update us with you plans in regards to this issue?

etungsten commented 1 year ago

Hi @Cajga,

Supporting iSCSI has some follow-on effects which are not easy to support today. Effectively, it either has to be supported through a new variant or through adding additional software packages to the base OS image (which may only be required by a subset of users), both of which are not ideal and likely not sustainable in the long run. The larger problem here is meeting needs of users with specific requirements without causing follow-on effects. It's not an easy problem to solve.

Earlier this year, the team started working on an out-of-tree build system for Bottlerocket (https://github.com/bottlerocket-os/bottlerocket/issues/2669), which addresses the larger problem without causing others: you can build Bottlerocket and add in the specific supporting software/drivers you need without having to fully roll-your-own variant nor having to maintain everything yourself. So, currently team is focusing on that effort instead of creating new variants or adding very specific supporting software to existing variants.

So there are currently no short-term plans to add iSCSI support directly, but the problem is likely to be resolved by the flexibility Bottlerocket will gain through out-of-tree builds.

Cajga commented 1 year ago

Hi @etungsten,

Thank you for the update, it seems to be a very interesting approach. Looking forward to test ot when it is ready.

I must say, I don't really see at the moment how would an OOTB variant be shipped and supported with EKS Anywhere for example. Like for EKS-A Bare Metal installation, a variant that contains the necessary packages and daemons to be able to use iSCSI and NFS based persistent storage would make sense but it may not be needed to other use cases...

Cajga commented 1 year ago

There seems to be an official variant called metal-k8s-VERSION.

From the Readme:

The following variants are designed to be Kubernetes worker nodes on bare metal

I wonder how would this goal be achieved without the possibility to use persistent storage. Or are you planning to depricate the metal variant when the OOTB solution is ready?

d3btech commented 9 months ago

Can we expect iscsi support in Bottlerocket anytime soon?

stmcginnis commented 9 months ago

Can we expect iscsi support in Bottlerocket anytime soon?

It is still in the queue of things we would like to see, but so far no one has been able to work on it yet. Contributions welcome of course, but otherwise this will be in the backlog until someone can devote some time to it.

It is good to see these comments on the issue to help gauge interest. That may help when trying to decide how to prioritize some of these backlog items. So please do feel free to chime in if anyone else would like to see the supported!

cparik commented 9 months ago

+1 - Adding comment on behalf of one of our customers. This limitation affects the customers needing PVs with RWX access mode in Bottlerocket based EKS-A clusters.

jda258 commented 9 months ago

We would like to see a resolution to this as a paying EKS-A customer. We have NetApp ONTAP storage we'd like to use with Astra Trident. We also want the advantages of using Bottlerocket.

trc-ikeskin commented 1 week ago

It might be worth noting that the Trident CSI driver is also used to integrate AWS FSx Ontap with EKS clusters: docs.aws.amazon.com/eks/latest/userguide/fsx-ontap.html

Same requirements apply.

Solving this will therefore not just be relevant for EKS-A bare metal, but also for EKS (with bottlerocket).

It's interesting that the packages are required host side vs. in the CSI daemonset.

Storage access is always from the host. The CSI driver orchestrates it but does not sit in the datapath.

Since this seems to be the official tracking issue for iSCSI integration, could we please emphasize that this is not a bare metal-only issue, but is also blocking the use of FSx Ontap with EKS (probably others) by adjusting the title for example?