Open sleepy-manul opened 1 year ago
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
I was just about to start working on etcd encryption for my cluster after having implemented the private endpoint for my keyvault last week. Well this is a bummer...
Yeah, looking at the documentation for KMS, MS clearly states that you have to grant the Key Vault Contributor role to the MID accessing the key, and that's required because it automatically creates the private endpoint connection for you.
I wonder if this means that because I don't have Key Vault contributor, I will still see a failure even if I have the private endpoint already available. Which seems absurd to me. The PE is there, Azure doesn't need a second PE. Just make the connection.
I tried to enable KMS with my private Keyvault but without passing --azure-keyvault-kms-key-vault-network-access "Private"
and got an error that access was not allowed. It actually is. I can get in the node on my cluster and run az keyvault list and see my key. There's no reason this shouldn't work.
I went ahead and enabled this the right way (after getting Key Vault Contributor added to my cluster's User-Assigned Managed Identity), with private keyvault and with the --azure-keyvault-kms-key-vault-network-access "Private"
and as expected, a second private endpoint was created. We have a separate subnet where we put our private endpoints and we created the PE for the keyvault in this private endpoints subnet, but it seems Azure puts the one this feature creates into the system nodes subnet.
Please consider adding support for specifying the subnet where the endpoint should be placed, and consider having the AKS API check for and use any pre-existing private endpoints by default.
It also appears that this PE took over the DNS entry for our pre-created PE in the desired subnet. This broke Terraform for us because the DNS entry now points to the IP in the system nodes subnet instead of our private endpoints subnet, and we have firewall rules allowing access from Jenkins to the keyvault only in the correct (private endpoints) subnet, not allowing access via the system nodes subnet.
@sleepy-manul @tspearconquest thank you for the feedback! We will discuss the design of the BYO private endpoint for KMS. Currently, yes a new private endpoint will be created by AKS.
Hi @lzhecheng we would like to understand the estimated timeline. When you mention to discuss the design, is this being considered as a new feature? We are looking to promote this setup to production in December, but currently this seems not to be possible without some workarounds on our end that are wholly unnecessary if this Key Vault as the Key Management Service was designed with proper understanding of user expectations.
Here I will lay out the expectations.
In this document, https://learn.microsoft.com/en-us/azure/aks/use-kms-etcd-encryption#turn-on-kms-for-a-private-key-vault, we find the following statement:
Creating or updating keys in a private key vault that doesn't have a private endpoint isn't supported. To learn how to manage private key vaults, see Integrate a key vault by using Azure Private Link.
Upon clicking the above link, we are guided to manually create a private endpoint for the keyvault. This facilitates following the instructions from the first link to create the Key resource inside the keyvault.
This key resource is required to be created before of the cluster in order to pass the key resource ID to the Azure KMS integration in the master nodes.
When we created the vault, we were forced to already create a private endpoint in order to create the Key resource inside the vault before the AKS cluster can be created/updated with the key vault key resource ID.
Therefore we should consider that creating a keyvault private endpoint from the AKS API is a bug in the design of the KMS feature, because already we were forced to create one PE specifically for the purpose of creating the Key resource inside, to give to the AKS API; and therefore the AKS API should not create one by default because the endpoint is already available in the subnet.
Hello @tspearconquest Thank you for providing more details! Now I think I'm more clear about your situation. But still please correct me if I'm wrong.
Our design is that Azure resources used for keyvault and those for AKS shouldn't be reused. All those resources you create for keyvault (vnet, private endpoint, private dns-zone...) shouldn't be used when creating your AKS cluster. The purpose is only for you (or dev VM) to access the keyvault. Private endpoint created by AKS is also for AKS only. BTW, if you create a public keyvault, create a key, then turn it to private, you can achieve the same goal.
Hi @lzhecheng - I'm responding from my personal GitHub acct.
Our design is that Azure resources used for keyvault and those for AKS shouldn't be reused.
Yes, this is our goal as well. We create this keyvault in the same network with the AKS cluster, it would not be reused anywhere. It is a private vault exclusively for the AKS cluster. It only contains a single Key resource for one AKS cluster and we have a different vault for each cluster.
All those resources you create for keyvault (vnet, private endpoint, private dns-zone...) shouldn't be used when creating your AKS cluster.
I forgot to mention that the virtual network is not for Keyvault; it is the AKS virtual network!
How can I create the private key in a non-public vault with no private endpoint?
The answer is that we are given no choice in the matter here. Because we have a hub-and-spoke network and very restrictive permissions set in Azure, our network team has a subscription with the central hub network and firewall and private DNS zones, and we have a spoke network in a different subscription. In order to create the private key resource, we must create the vault so it is only accessible within our spoke network; then someone must make a private endpoint so that the private Key resource can be created and its ID retrieved. We are able to make this private endpoint ourselves, and must do so in order to create the Key resource!
Then the Key resource ID is provided to AKS API. So we must BYO private key AND must NOT BYO private endpoint. This is the current expectation you have laid out. But we cannot just "not BYO private endpoint" because of the requirement to create the Key resource!
BTW, if you create a public keyvault, create a key, then turn it to private, you can achieve the same goal.
Because we are an enterprise with compliance requirements, we could not follow these steps as per company policy disallows us to create a public vault where we could temporarily store the key, then switch the vault to private and then have the AKS cluster create the private endpoint. It just wouldn't work because our Azure Policy disallows public Keyvaults.
To summarize:
The issue stems from a combination of:
If AKS will provide the private endpoint, then it should also create the private Key resource itself. In this way, we would no longer be required to make the private connection ourselves in order to create the Key resource. But this comes with its own set of problems around rotation of the Key, which we can do ourselves easily from Terraform now and I would not want to change it personally.
Hi @lzhecheng just to add - the current design requires BYO private key but doesn't allow for BYO private endpoint to create that key.
Hello @tspearconquest
I forgot to mention that the virtual network is not for Keyvault; it is the AKS virtual network!
Is it possible that keyvault and AKS are in 2 different VNETs?
If AKS will provide the private endpoint, then it should also create the private Key resource itself. In this way, we would no longer be required to make the private connection ourselves in order to create the Key resource.
We have fully managed KMS in our roadmap which means you just enabled/disable KMS, the rest is left with AKS.
Hi @lzhecheng currently no. Each team has their own spoke vnet, and we have a limited amount of IPs for the cluster (a /19) so we can't split this up to give the vault its own /28 or slightly bigger virtual network unfortunately. I also have doubts about this because of the centrally managed private DNS. When we create the PE to the keyvault, DNS is getting created in the zones managed in the network team subscription.
In a dedicated keyvault vnet setup, we would have a PE in the spoke dedicated keyvault vnet which would still be peered to our hub and would still set DNS entries in the zone for privatelink.vaultcore.azure.net.
Then AKS would create a new PE to its subnet and still take over the existing DNS entry in our central private zone for vault.
Hello @Speeddymon Thank you for the response. I think we can add BYO support (PE, VNET, subnet, DNS) into our roadmap. But it may not be the next priority because we already have plans for other improvements. As for workaround today, we come up with two solutions, hope them can fit your request:
Describe the bug According to the documentation at https://learn.microsoft.com/en-us/azure/aks/use-kms-etcd-encryption, creating an AKS cluster with etcd encryption and keyVaultNetworkAccess=private will always force create a new private endpoint, even if there if there is already a working PL for that KV in the AKS subnet.
This means:
To Reproduce This is a design bug; for details, see above.
Expected behaviour AKS should automatically determine that there is already a working private endpoint in that subnet and use it.
Screenshots not applicable
Environment (please complete the following information): all environments
Additional context N/A