Open jonathan-innis opened 1 year ago
I am interested in picking up this piece of work. Following up on potential approach suggestions from jonathan-innis@, I have the following comments.
Tldr: I am torn on either of approach 2.
or 3.
and I am leaning towards 3.
though I anticipate 2.
might be preferable for Karpenter maintainers.
WindowsCustom
AMIFamily that assumes that you are using the Windows OS when getting the instance type requirements
Custom
and WindowsCustom
(or something else) will have add a slight burden whereby the API requires specific values. Since the OS can be retrieved from the AMI details, I personally would prefer a unified type Custom
for both Linux and Windows.
Pass the AMI requirements down into the instance type requirements early on so that we get info on the restrictions that our AMIs place on our instance types prior to the scheduling loop
- I assume you are referring to being able to fetch AMI details from EC2 whereby we can get the platform and architecture for the custom AMI and then use that information to determine subsequent workflows.
- This is probably the preferred approach to take. Among other benefits it maintains the existing Karpenter API as is while we extend it to add Windows support. It comes with the drawback of having to add platform specific logic in the code workflow but thats acceptable.
Support a separate field in the AWSNodeTemplate API, something like useDefaultBootstrap: false that would allow AMIFamilies that already exist to disable the script and allow you to specify a custom script
- This is less preferred than
2.
but is worthy of consideration. Ideas for the flag names could beuseDefaultBootstrap: false
as already suggesteed prior by jonathan-innis@,mergeUserData: false
or something else.- Why is it worthy of consideration?
- Because in the realm of Windows on AWS and Kubernetes, only Windows Server 2019 an Windows Server 2022 are supported. See https://kubernetes.io/docs/concepts/windows/intro/ which specifies that the supported Windows OSs for Kubernetes are "Windows Server 2019 or Windows Server 2022". Therefore potential existence of different flavours such as AL2, Bottlerocket, RedHat e.t.c doesn't exist for Windows, so the
Custom
field is less practical. As of today, anyone can launch custom built Windows AMIs using the existing AMI families ofWindows2019
andWindows2022
with a specific AMI ID usingspec.amiSelectorTerms.id
so for Windows, custom AMIs are implicitly supported.- While custom AMIs already are partially working on Windows in the above described way, the EKS Optimized AMI user data will still run, whereas if the entry point is different from the Windows workflow, the user data will encounter an error while trying to invoke the EKS Optimized AMI Powershell script that bootstraps the node. So an option to have
useDefaultBootstrap: false
would be valuable and would thus solve the problem of running custom Windows AMIs in a much simpler way.- Are there any drawbacks for this? Yes. Its not a straightforward change to decide on as the correct thing to do or not do.
While its practical to reuse existing AMI families of
Windows2019
andWindows2022
for custom AMIs, this seems to somewhat violate the Karpenter API original intent which implies that custom AMIs should useCustom
. If its acceptable, documentation can be added to educate on how to correctly launch Windows custom AMIs and Linux respectively. But it makes the Karpenter API less intuitive to follow and understand by having different behaviour for Windows and Linux.Second, having someone need to specify either of
Windows2019
orWindows2022
is unnecessary for a custom AMI. All that is required to know at launch time by Karpenter is whether the AMI iswindows
platform and this will be dynamically retrieved from EC2 or otherwise a new field can be specified so it can be specified declaratively.Third, while as of today only Windows Server 2019 and Windows Server 2022 are supported, if someone has a use case of wanting to launch Windows 10, Windows 11, Windows Server 2016 or the newer Windows Server 2025 (WS2025) (currently in preview mode), then no existing AMI family options can be leveraged. These other Windows flavours can theoretically work on Kubernetes, albeit without support for some features or with instability .Is there a use case for needing to launch with these other officially unsupported flavours? - That is unknown though unlikely. Providing the flexibility might be preferable maybe?
cc @engedaam @jonathan-innis can you please share your thoughts? Before I begin implementing this, its ideal to reach alignment on the best approach.
I personally would prefer if customer did not need to think about windows/linux
when they would like to use windows AMIs.
On the three approaches:
1.) I agree with you here. I personally would prefer if customer did not need to think about Windows/Linux
when they would like to use windows AMIs. In the case of custom, we should build it out such that any AMI that may need additional requirements, will need expanded the requirement set on instance types rather then adding additional API surface
2.) I'm leaning towards this option. Adding automatic support based on the describeImages API seems like the right approach to me.
I assume you are referring to being able to fetch AMI details from EC2 whereby we can get the platform and architecture for the custom AMI and then use that information to determine subsequent workflows, no?
Do we only need consider the platform requirement? Does the it matter if user try to launch windows2019
, windows2022
, windows10
, etc. with Custom
AMIFamily? We are only able to get the platform information for that API call.
"Images": [
{
"Architecture": "x86_64",
"CreationDate": "2024-09-12T16:09:55.000Z",
"ImageId": "ami-0fc07a79f5df585c4",
"ImageLocation": "amazon/Windows_Server-2022-English-Core-EKS_Optimized-1.30-2024.09.10",
"ImageType": "machine",
"Public": true,
"OwnerId": "137057727718",
"Platform": "windows",
"PlatformDetails": "Windows",
"UsageOperation": "RunInstances:0002",
"State": "available",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-0526d89c95eee7767",
"VolumeSize": 50,
"VolumeType": "gp2",
"Encrypted": false
}
},
...
3.) My opinion on this is the same as option 1. I'm not convinced yet that additional API surface is needed.
Third, while as of today only Windows Server 2019 and Windows Server 2022 are supported, if someone has a use case of wanting to launch Windows 10, Windows 11, Windows Server 2016 or the newer Windows Server 2025 (WS2025) (currently in preview mode), then no existing AMI family options can be leveraged. These other Windows flavours can theoretically work on Kubernetes, albeit without support for some features or with instability .Is there a use case for needing to launch with these other officially unsupported flavours? - That is unknown though unlikely. Providing the flexibility might be preferable maybe?
What is preventing us from getting this for free with option 2?
Had a chat with @engedaam and the agreed approach will be 2.
to leverage the Custom
option. I will document discussion points we had in a brief document in the design folder.
Description
What problem are you trying to solve?
Currently, if a user wants to enable custom bootstrapping logic with a
Windows
image, this isn't possible since theCustom
AMIFamily automatically assumes that you are using alinux
OS and infers some other things about the image that you are launching.There are some potential options here that we might want to explore:
WindowsCustom
AMIFamily that assumes that you are using theWindows
OS when getting the instance type requirementsAWSNodeTemplate
API, something likeuseDefaultBootstrap: false
that would allow AMIFamilies that already exist to disable the script and allow you to specify a custom scriptHow important is this feature to you?
This isn't really blocking as-of now unless we get more issues around users wanting to not utilize the existing EKS bootstrap script that's used by Windows AMIFamilies by default and use their own custom scripts entirely.