aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.77k stars 953 forks source link

Windows Support with `Custom` AMIFamily #4608

Open jonathan-innis opened 1 year ago

jonathan-innis commented 1 year ago

Description

What problem are you trying to solve?

Currently, if a user wants to enable custom bootstrapping logic with a Windows image, this isn't possible since the Custom AMIFamily automatically assumes that you are using a linux OS and infers some other things about the image that you are launching.

There are some potential options here that we might want to explore:

  1. WindowsCustom AMIFamily that assumes that you are using the Windows OS when getting the instance type requirements
  2. Pass the AMI requirements down into the instance type requirements early on so that we get info on the restrictions that our AMIs place on our instance types prior to the scheduling loop
  3. Support a separate field in the AWSNodeTemplate API, something like useDefaultBootstrap: false that would allow AMIFamilies that already exist to disable the script and allow you to specify a custom script

How important is this feature to you?

This isn't really blocking as-of now unless we get more issues around users wanting to not utilize the existing EKS bootstrap script that's used by Windows AMIFamilies by default and use their own custom scripts entirely.

tzifudzi commented 4 weeks ago

I am interested in picking up this piece of work. Following up on potential approach suggestions from jonathan-innis@, I have the following comments.

Tldr: I am torn on either of approach 2. or 3. and I am leaning towards 3. though I anticipate 2. might be preferable for Karpenter maintainers.

  1. WindowsCustom AMIFamily that assumes that you are using the Windows OS when getting the instance type requirements
  1. Pass the AMI requirements down into the instance type requirements early on so that we get info on the restrictions that our AMIs place on our instance types prior to the scheduling loop

    • I assume you are referring to being able to fetch AMI details from EC2 whereby we can get the platform and architecture for the custom AMI and then use that information to determine subsequent workflows.
    • This is probably the preferred approach to take. Among other benefits it maintains the existing Karpenter API as is while we extend it to add Windows support. It comes with the drawback of having to add platform specific logic in the code workflow but thats acceptable.
  2. Support a separate field in the AWSNodeTemplate API, something like useDefaultBootstrap: false that would allow AMIFamilies that already exist to disable the script and allow you to specify a custom script

    • This is less preferred than 2. but is worthy of consideration. Ideas for the flag names could be useDefaultBootstrap: false as already suggesteed prior by jonathan-innis@, mergeUserData: false or something else.
    • Why is it worthy of consideration?
    • Because in the realm of Windows on AWS and Kubernetes, only Windows Server 2019 an Windows Server 2022 are supported. See https://kubernetes.io/docs/concepts/windows/intro/ which specifies that the supported Windows OSs for Kubernetes are "Windows Server 2019 or Windows Server 2022". Therefore potential existence of different flavours such as AL2, Bottlerocket, RedHat e.t.c doesn't exist for Windows, so the Custom field is less practical. As of today, anyone can launch custom built Windows AMIs using the existing AMI families of Windows2019 and Windows2022 with a specific AMI ID using spec.amiSelectorTerms.id so for Windows, custom AMIs are implicitly supported.
    • While custom AMIs already are partially working on Windows in the above described way, the EKS Optimized AMI user data will still run, whereas if the entry point is different from the Windows workflow, the user data will encounter an error while trying to invoke the EKS Optimized AMI Powershell script that bootstraps the node. So an option to have useDefaultBootstrap: false would be valuable and would thus solve the problem of running custom Windows AMIs in a much simpler way.
    • Are there any drawbacks for this? Yes. Its not a straightforward change to decide on as the correct thing to do or not do.
  3. While its practical to reuse existing AMI families of Windows2019 and Windows2022 for custom AMIs, this seems to somewhat violate the Karpenter API original intent which implies that custom AMIs should use Custom. If its acceptable, documentation can be added to educate on how to correctly launch Windows custom AMIs and Linux respectively. But it makes the Karpenter API less intuitive to follow and understand by having different behaviour for Windows and Linux.

  4. Second, having someone need to specify either of Windows2019 or Windows2022 is unnecessary for a custom AMI. All that is required to know at launch time by Karpenter is whether the AMI is windows platform and this will be dynamically retrieved from EC2 or otherwise a new field can be specified so it can be specified declaratively.

  5. Third, while as of today only Windows Server 2019 and Windows Server 2022 are supported, if someone has a use case of wanting to launch Windows 10, Windows 11, Windows Server 2016 or the newer Windows Server 2025 (WS2025) (currently in preview mode), then no existing AMI family options can be leveraged. These other Windows flavours can theoretically work on Kubernetes, albeit without support for some features or with instability .Is there a use case for needing to launch with these other officially unsupported flavours? - That is unknown though unlikely. Providing the flexibility might be preferable maybe?

cc @engedaam @jonathan-innis can you please share your thoughts? Before I begin implementing this, its ideal to reach alignment on the best approach.

engedaam commented 4 weeks ago

I personally would prefer if customer did not need to think about windows/linux when they would like to use windows AMIs.

On the three approaches: 1.) I agree with you here. I personally would prefer if customer did not need to think about Windows/Linux when they would like to use windows AMIs. In the case of custom, we should build it out such that any AMI that may need additional requirements, will need expanded the requirement set on instance types rather then adding additional API surface
2.) I'm leaning towards this option. Adding automatic support based on the describeImages API seems like the right approach to me.

I assume you are referring to being able to fetch AMI details from EC2 whereby we can get the platform and architecture for the custom AMI and then use that information to determine subsequent workflows, no?

Do we only need consider the platform requirement? Does the it matter if user try to launch windows2019, windows2022, windows10, etc. with Custom AMIFamily? We are only able to get the platform information for that API call.

"Images": [
        {
            "Architecture": "x86_64",
            "CreationDate": "2024-09-12T16:09:55.000Z",
            "ImageId": "ami-0fc07a79f5df585c4",
            "ImageLocation": "amazon/Windows_Server-2022-English-Core-EKS_Optimized-1.30-2024.09.10",
            "ImageType": "machine",
            "Public": true,
            "OwnerId": "137057727718",
            "Platform": "windows",
            "PlatformDetails": "Windows",
            "UsageOperation": "RunInstances:0002",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-0526d89c95eee7767",
                        "VolumeSize": 50,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    }
                },
               ...

3.) My opinion on this is the same as option 1. I'm not convinced yet that additional API surface is needed.

Third, while as of today only Windows Server 2019 and Windows Server 2022 are supported, if someone has a use case of wanting to launch Windows 10, Windows 11, Windows Server 2016 or the newer Windows Server 2025 (WS2025) (currently in preview mode), then no existing AMI family options can be leveraged. These other Windows flavours can theoretically work on Kubernetes, albeit without support for some features or with instability .Is there a use case for needing to launch with these other officially unsupported flavours? - That is unknown though unlikely. Providing the flexibility might be preferable maybe?

What is preventing us from getting this for free with option 2?

tzifudzi commented 3 weeks ago

Had a chat with @engedaam and the agreed approach will be 2. to leverage the Custom option. I will document discussion points we had in a brief document in the design folder.