Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

[BUG] Impossible to create AKS cluster with etcd encryption without creating a new private endpoint to KV #3542

Open sleepy-manul opened 1 year ago

sleepy-manul commented 1 year ago

Describe the bug According to the documentation at https://learn.microsoft.com/en-us/azure/aks/use-kms-etcd-encryption, creating an AKS cluster with etcd encryption and keyVaultNetworkAccess=private will always force create a new private endpoint, even if there if there is already a working PL for that KV in the AKS subnet.

This means:

To Reproduce This is a design bug; for details, see above.

Expected behaviour AKS should automatically determine that there is already a working private endpoint in that subnet and use it.

Screenshots not applicable

Environment (please complete the following information): all environments

Additional context N/A

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

tspearconquest commented 4 weeks ago

I was just about to start working on etcd encryption for my cluster after having implemented the private endpoint for my keyvault last week. Well this is a bummer...

tspearconquest commented 2 weeks ago

Yeah, looking at the documentation for KMS, MS clearly states that you have to grant the Key Vault Contributor role to the MID accessing the key, and that's required because it automatically creates the private endpoint connection for you.

I wonder if this means that because I don't have Key Vault contributor, I will still see a failure even if I have the private endpoint already available. Which seems absurd to me. The PE is there, Azure doesn't need a second PE. Just make the connection.

tspearconquest commented 2 weeks ago

I tried to enable KMS with my private Keyvault but without passing --azure-keyvault-kms-key-vault-network-access "Private" and got an error that access was not allowed. It actually is. I can get in the node on my cluster and run az keyvault list and see my key. There's no reason this shouldn't work.

tspearconquest commented 2 weeks ago

I went ahead and enabled this the right way (after getting Key Vault Contributor added to my cluster's User-Assigned Managed Identity), with private keyvault and with the --azure-keyvault-kms-key-vault-network-access "Private" and as expected, a second private endpoint was created. We have a separate subnet where we put our private endpoints and we created the PE for the keyvault in this private endpoints subnet, but it seems Azure puts the one this feature creates into the system nodes subnet.

Please consider adding support for specifying the subnet where the endpoint should be placed, and consider having the AKS API check for and use any pre-existing private endpoints by default.

tspearconquest commented 2 weeks ago

It also appears that this PE took over the DNS entry for our pre-created PE in the desired subnet. This broke Terraform for us because the DNS entry now points to the IP in the system nodes subnet instead of our private endpoints subnet, and we have firewall rules allowing access from Jenkins to the keyvault only in the correct (private endpoints) subnet, not allowing access via the system nodes subnet.

lzhecheng commented 2 weeks ago

@sleepy-manul @tspearconquest thank you for the feedback! We will discuss the design of the BYO private endpoint for KMS. Currently, yes a new private endpoint will be created by AKS.

tspearconquest commented 1 week ago

Hi @lzhecheng we would like to understand the estimated timeline. When you mention to discuss the design, is this being considered as a new feature? We are looking to promote this setup to production in December, but currently this seems not to be possible without some workarounds on our end that are wholly unnecessary if this Key Vault as the Key Management Service was designed with proper understanding of user expectations.

Here I will lay out the expectations.

In this document, https://learn.microsoft.com/en-us/azure/aks/use-kms-etcd-encryption#turn-on-kms-for-a-private-key-vault, we find the following statement:

Creating or updating keys in a private key vault that doesn't have a private endpoint isn't supported. To learn how to manage private key vaults, see Integrate a key vault by using Azure Private Link.

Upon clicking the above link, we are guided to manually create a private endpoint for the keyvault. This facilitates following the instructions from the first link to create the Key resource inside the keyvault.

This key resource is required to be created before of the cluster in order to pass the key resource ID to the Azure KMS integration in the master nodes.

When we created the vault, we were forced to already create a private endpoint in order to create the Key resource inside the vault before the AKS cluster can be created/updated with the key vault key resource ID.

Therefore we should consider that creating a keyvault private endpoint from the AKS API is a bug in the design of the KMS feature, because already we were forced to create one PE specifically for the purpose of creating the Key resource inside, to give to the AKS API; and therefore the AKS API should not create one by default because the endpoint is already available in the subnet.

lzhecheng commented 1 week ago

Hello @tspearconquest Thank you for providing more details! Now I think I'm more clear about your situation. But still please correct me if I'm wrong.

Our design is that Azure resources used for keyvault and those for AKS shouldn't be reused. All those resources you create for keyvault (vnet, private endpoint, private dns-zone...) shouldn't be used when creating your AKS cluster. The purpose is only for you (or dev VM) to access the keyvault. Private endpoint created by AKS is also for AKS only. BTW, if you create a public keyvault, create a key, then turn it to private, you can achieve the same goal.

Speeddymon commented 1 week ago

Hi @lzhecheng - I'm responding from my personal GitHub acct.

Our design is that Azure resources used for keyvault and those for AKS shouldn't be reused.

Yes, this is our goal as well. We create this keyvault in the same network with the AKS cluster, it would not be reused anywhere. It is a private vault exclusively for the AKS cluster. It only contains a single Key resource for one AKS cluster and we have a different vault for each cluster.

All those resources you create for keyvault (vnet, private endpoint, private dns-zone...) shouldn't be used when creating your AKS cluster.

I forgot to mention that the virtual network is not for Keyvault; it is the AKS virtual network!

How can I create the private key in a non-public vault with no private endpoint?

The answer is that we are given no choice in the matter here. Because we have a hub-and-spoke network and very restrictive permissions set in Azure, our network team has a subscription with the central hub network and firewall and private DNS zones, and we have a spoke network in a different subscription. In order to create the private key resource, we must create the vault so it is only accessible within our spoke network; then someone must make a private endpoint so that the private Key resource can be created and its ID retrieved. We are able to make this private endpoint ourselves, and must do so in order to create the Key resource!

Then the Key resource ID is provided to AKS API. So we must BYO private key AND must NOT BYO private endpoint. This is the current expectation you have laid out. But we cannot just "not BYO private endpoint" because of the requirement to create the Key resource!

BTW, if you create a public keyvault, create a key, then turn it to private, you can achieve the same goal.

Because we are an enterprise with compliance requirements, we could not follow these steps as per company policy disallows us to create a public vault where we could temporarily store the key, then switch the vault to private and then have the AKS cluster create the private endpoint. It just wouldn't work because our Azure Policy disallows public Keyvaults.

To summarize:

The issue stems from a combination of:

  1. Needing to provide the private Key resource to AKS API
  2. Our enterprise hub and spoke network, and
  3. Azure Policy requirements.

If AKS will provide the private endpoint, then it should also create the private Key resource itself. In this way, we would no longer be required to make the private connection ourselves in order to create the Key resource. But this comes with its own set of problems around rotation of the Key, which we can do ourselves easily from Terraform now and I would not want to change it personally.

tspearconquest commented 1 week ago

Hi @lzhecheng just to add - the current design requires BYO private key but doesn't allow for BYO private endpoint to create that key.

lzhecheng commented 1 week ago

Hello @tspearconquest

I forgot to mention that the virtual network is not for Keyvault; it is the AKS virtual network!

Is it possible that keyvault and AKS are in 2 different VNETs?

If AKS will provide the private endpoint, then it should also create the private Key resource itself. In this way, we would no longer be required to make the private connection ourselves in order to create the Key resource.

We have fully managed KMS in our roadmap which means you just enabled/disable KMS, the rest is left with AKS.

Speeddymon commented 1 week ago

Hi @lzhecheng currently no. Each team has their own spoke vnet, and we have a limited amount of IPs for the cluster (a /19) so we can't split this up to give the vault its own /28 or slightly bigger virtual network unfortunately. I also have doubts about this because of the centrally managed private DNS. When we create the PE to the keyvault, DNS is getting created in the zones managed in the network team subscription.

In a dedicated keyvault vnet setup, we would have a PE in the spoke dedicated keyvault vnet which would still be peered to our hub and would still set DNS entries in the zone for privatelink.vaultcore.azure.net.

Then AKS would create a new PE to its subnet and still take over the existing DNS entry in our central private zone for vault.

lzhecheng commented 1 week ago

Hello @Speeddymon Thank you for the response. I think we can add BYO support (PE, VNET, subnet, DNS) into our roadmap. But it may not be the next priority because we already have plans for other improvements. As for workaround today, we come up with two solutions, hope them can fit your request: