Closed TomGeske closed 3 years ago
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.
Hi, we tried to solve this with a policy that auto links the hub vnet on the creation of the private dns zone. Does not work, as the delay from creation to actual linking seems to be to long.
For now we reset the vnet DNS settings prior to the deployment to work around this issue.
Can we have the timeout bumped up to some minutes, so that the policy has a chance to deploy the link?
Cheers Oliver
@LoboHacks I'm adding a sample here: https://github.com/Azure/terraform/pull/53. Could you check whether it works for you?
Here's what we've done...
aks-resolver
) running in one of our aks clusters.azmk8s.io
to our aks-resolver
aks-resolver
is inthe module is like so:
variable "private_cluster_dns_linked_vnets" {
description = "Link these virtual network name:id to the cluster's private dns zone"
type = map(string)
default = {}
}
locals {
private_link_dns_regex = regex("(?P<host>[a-z-\\d]+)\\.(?P<zone>.+)", azurerm_kubernetes_cluster.aks.private_fqdn)
}
resource "azurerm_kubernetes_cluster" "aks" {
dns_prefix = var.dns_prefix
location = var.location
name = var.name
resource_group_name = var.resource_group_name
kubernetes_version = var.control_plane_version
private_cluster_enabled = true
...
}
resource "azurerm_private_dns_zone_virtual_network_link" "aks" {
for_each = var.private_cluster_dns_linked_vnets
name = lower(each.key)
private_dns_zone_name = local.private_link_dns_regex.zone
resource_group_name = azurerm_kubernetes_cluster.aks.node_resource_group
virtual_network_id = each.value
}
and used like this:
data "azurerm_virtual_network" "ci" {
for_each = var.ci.virtual_networks
provider = azurerm.ci
name = each.value.name
resource_group_name = each.value.resource_group
}
module "aks" {
for_each = var.aks_clusters
source = "redacted"
version = "2.1.6"
dns_prefix = var.resource_groups[each.value.resource_group].prefix
location = var.resource_groups[each.value.resource_group].location
resource_group_name = module.resource_groups[each.value.resource_group].name
prefix = var.resource_groups[each.value.resource_group].prefix
name = each.key
...
private_cluster_dns_linked_vnets = {
format("%s-%s", var.ci.virtual_networks.dns_forwarder.resource_group, var.ci.virtual_networks.dns_forwarder.name) = data.azurerm_virtual_network.ci["dns_forwarder"].id
}
}
I'd be curious why @jon-walton s solution works since the azurerm_private_dns_zone_virtual_network_link would be created after the azurerm_kubernetes_cluster resource - so the azurerm_kubernetes_cluster resource should still fail to create in my opinion. @feiskyer s solution would work since it polls for the private dns zone in the background and instantly creates the link.
both solutions won't work in our context since we don't have enough privileges to link a private dns zone to the hub vNet - it's added with a policy
@phbergsmann that's a good point actually. I'm only doing the linking to allow us to resolve the apiserver's hostname from on-prem
Looking at the private dns zone's activity, there was an event during cluster creation Create or Update Private DNS zone link to virtual network
initiated by AzureContainerService
which linked the cluster's private dns zone with the vnet I provided
Would like to know about the current state of this. We need to resolve not one but multiple AKS clusters from Azure but also from our on-prem DNS. That's why we have each privatelink (for AKS, Storage Accounts, MSSQL etc.) exactly once and they're all linked to the vnet where the DNS forwarder is connected to. But until we're able to specify an existing private DNS zone this won't help us.
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.
Hello, I am facing the same issue where I am not able to link private dns zone created during AKS cluster deployment with the dns resolver vnet. Do we have any workaround which can be used to link these along with terraform. I am using azure release pipelines for the private cluster creation.
@megharaikwar You can get this working in Terraform with a custom plugin that will read the DNS zones in the AKS node resource group. I have built a POC of that, but it's a bit of a hack because the plugin really should be a data source, but must be implemented as a resource (because data sources are read at plan time). I could clean it up and publish it but I'm wondering if it is worth the effort as this issue has moved to Public Preview on the roadmap. @palma21, how do we find docs that explain how to use the preview?
there is an example on 301-aks-private-cluster.
Hello, I see this feature has been moved to public preview. How can I implement this?
@dejiaja please refer AKS docs https://docs.microsoft.com/en-us/azure/aks/private-clusters#create-a-private-aks-cluster-with-private-dns-zone.
Hello, I see this feature has been moved to public preview. How can I implement this?
Hello, could you provide more information about it please?
@dejiaja please refer AKS docs https://docs.microsoft.com/en-us/azure/aks/private-clusters#create-a-private-aks-cluster-with-private-dns-zone.
Thanks for your response. I thought the feature that was being implemented, was supposed to stop the cluster creation from failing before the private dns zone is linked to the hub vnet. Is that not the case? I looked at the link you provided but it is still recommending a workaround (azure function etc) as opposed to fixing the issue with private clusters failing on creation when it should not.
@dejiaja it would fail when using PrivateDNSZone (default is --private-dns-zone=system
), but it won't if the cluster is created with --private-dns-zone=none
.
By the way,, BYO PrivateDNSZone would be supported next month, so that the DNS setup could be done earlier than cluster creation.
@feiskyer is there an ETA? Or just "by end of January"?
End of January.
For some early details on using it now (while in preview), check out this comment: https://github.com/Azure/AKS/issues/1508#issuecomment-754805314
none
= no cluster failure at deployment time, but you need to add DNS configuration after the fact.
resourceId = bring your own Private DNS zone (all vnet-hooked up, and ready to go)
system
= today's default experience.
I think bring your own private dns zone feature is available now. Does anybody already used it and know about it?
@megharaikwar yes it is available. As far as using it here is the flow you want to follow.
there is an example on 301-aks-private-cluster.
Has anyone gotten this to work reliably from inside of a module? I have a module which creates the AKS cluster and invokes this script over a null resource as intended. However, the processing order seems to be inconsistent and sometimes the script fires up first and starts waiting for the node resource group which never gets created. Is it just me? Whatever the reason, this doesn't really work from an automation standpoint.
there is an example on 301-aks-private-cluster.
Has anyone gotten this to work reliably from inside of a module? I have a module which creates the AKS cluster and invokes this script over a null resource as intended. However, the processing order seems to be inconsistent and sometimes the script fires up first and starts waiting for the node resource group which never gets created. Is it just me? Whatever the reason, this doesn't really work from an automation standpoint.
Like yourself I could not get the null_resource working. It just keeps looping for the node resource_group
@anttipo @viralpat218 I created a separate stage inside the pipeline in my Azure DevOps and ran AKS creation and script in parallel for the DNS Zone creation script and it worked well. null resource was'nt working for me too. Please let me know if you need further information about it.
@anttipo @viralpat218 @megharaikwar
What's the error output if that example doesn't work?
I assume one possible reason is because the default kubernetes version 1.17.9 in that example is not supported any more. Could you add one more line in terraform plan -var 'kubernetes_version=1.19.6'
.
Or could you share your subscriptionID and resource group name so I can look into your error details.
@anttipo @viralpat218 @megharaikwar What's the error output if that example doesn't work? I assume one possible reason is because the default kubernetes version 1.17.9 in that example is not supported any more. Could you add one more line in terraform plan
-var 'kubernetes_version=1.19.6'
. Or could you share your subscriptionID and resource group name so I can look into your error details.
@levimm The null_resource runs in parallel to the aks cluster provision, but the shell script the null_resource is executing can't get past the first stage, which is where it is looking for the resource group. By that stage, the resource group is created, and the MC_XXXXX resource group has been created. The shell script running outside null_resource is fine, it's when running within the dns-zone.tf the script cannot get past the first function.
@anttipo @viralpat218 @megharaikwar What's the error output if that example doesn't work? I assume one possible reason is because the default kubernetes version 1.17.9 in that example is not supported any more. Could you add one more line in terraform plan
-var 'kubernetes_version=1.19.6'
. Or could you share your subscriptionID and resource group name so I can look into your error details.
I'm gonna have to take my original comment back and declare that after some adjustments it seems to be working just fine. I'm running this on CI which was unable to create service principals due to missing permissions but I never got the failure message from that due to the local exec having no timeout and being stuck in the 'waiting for node resource group' loop. After I commented out the local-exec part entirely I was able to finally see the error.
TL;DR seems to be working just fine for me. Thanks for the support!
@viralpat218 At that stage, is private dns zone get created in the MC_XXX resource group? I think there might be something wrong with the parameter value that causes the cluster creation failed. So the private dns zone is not created. Could you share your parameter?
Will at any point the BYO Private DNS Zone implementation support formats other than privatelink.\<region>.azk8sm.io? I find it convenient to slice private zones by environment or application rather than service type.
Will at any point the BYO Private DNS Zone implementation support formats other than privatelink.
.azk8sm.io? I find it convenient to slice private zones by environment or application rather than service type.
No, that's not supported.
Thanks @feiskyer
@feiskyer , since the format won't change, would it be possible to add a prefix? :
[user defined prefix].privatelink.[region].azmk8s.io
qa.privatelink.[region].azmk8s.io dev.privatelink.[region].azmk8s.io and so on....
@arobass custom defined DNS subdomain would be supported soon, e.g. [custom-subdomain].privatelink.[region].azmk8s.io
. Do you think this works for you?
@feiskyer Yes this is exactly what we need, we need to be able to set the prefix of the privatelink dns zone.
Do you have an ETA of GA availability or public/private preview for this feature in Canada regions?
It would be public preview next month.
When utilizing the "None" option for the private DNS zone, the cluster creates without failing using on-prem DNS in the VNET. However, after creation it seems like the portal no longer works to interact with the cluster.
Looking at network traffic, it looks like it's trying to find a DNS record with the format
Is that expected behavior? If so, is that GUID available anywhere to lookup programmatically so that we can create a record on our internal DNS server?
@justin-sobanski Thanks for the issue reported. It's indeed a bug that portal dns not created. I'll fix that and should be available in next AKS release.
@anttipo @viralpat218 I created a separate stage inside the pipeline in my Azure DevOps and ran AKS creation and script in parallel for the DNS Zone creation script and it worked well. null resource was'nt working for me too. Please let me know if you need further information about it.
Hi, I have faced
@dejiaja it would fail when using PrivateDNSZone (default is
--private-dns-zone=system
), but it won't if the cluster is created with--private-dns-zone=none
.By the way,, BYO PrivateDNSZone would be supported next month, so that the DNS setup could be done earlier than cluster creation.
Could you let me know, I have provisioned the AKS, post the AKS provision, I am facing this issue. [{"code":"CreateVMSSAgentPoolFailed","message":"Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information.
I am not including option of "privateDNSZone" in my ARM templates and in the Vnet we have DNS Servers. Post the deployment, I have checked that DNS Zone which has created by AKS itself has linked to the VNet.
Still, I am facing this issue.
Should I create a separate DNS Zone before AKS is creating like what @megharaikwar has done.
@levimm @feiskyer @justin-sobanski @arobass
@Musham-Aj have custom DNS servers enabled on the VNet? if so, please follow https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns to setup DNS forwarding (and here for example terraform manifests).
@Musham-Aj have custom DNS servers enabled on the VNet? if so, please follow https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns to setup DNS forwarding (and here for example terraform manifests).
Thanks for the response @feiskyer
have custom DNS servers enabled on the VNet? if so, please follow https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns to setup DNS forwarding In the above it is showing that DNS Zone should be linked to VNet which has DNS servers enabled, Yes, When I deployed AKS, the Private DNS Zone which has created by AKS itself has created a link to the VNet where DNS Servers are enabled.
But still it is showing this error. "Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured".
Could you clarify that Should we explicitly create a Private DNS Zone and forward it to "168.63.129.16"? where My AKS cluster is deploying into existing VNet and these VNet has a connection to On-premise
@Musham-Aj yes. suggest using BYO Private DNS zone for such scenario:
az aks create -n <private-cluster-name> -g <private-cluster-resource-group> --enable-private-cluster --private-dns-zone <custom private dns zone ResourceId> --fqdn-subdomain <subdomain-name>
Thanks for the response you have provided.
Could you please tell me, what is the difference between if we use BYO Private DNS Zone and AKS-created Private DNS Zone itself in the node resource group.
Both DNS Zones are linking to VNet which has DNS Servers enabled right.
@feiskyer
The BYO Private DNS Zone allows customers to pre-create a Private DNS Zone which addressed a bug in the AKS-created Private DNS Zone where the private cluster creation fails for BYO VNet configuration when the DNS zone is not properly attached to BYO VNet or your Hub network.
Both DNS Zones are linking to the VNET which has DNS Servers enabled right? The end state is the same for both correct.
The BYO Private DNS Zone allows customers to pre-create a Private DNS Zone which addressed a bug in the AKS-created Private DNS Zone where the private cluster creation fails for BYO VNet configuration when the DNS zone is not properly attached to BYO VNet or your Hub network.
Both DNS Zones are linking to the VNET which has DNS Servers enabled right? The end state is the same for both correct.
It still creates the Private DNS Zone as part of the AKS, so the issue still remains if you have custom DNS you still need to link the peering? Regardless of the pre-create Private DNS Zone
The BYO Private DNS Zone allows customers to pre-create a Private DNS Zone which addressed a bug in the AKS-created Private DNS Zone where the private cluster creation fails for BYO VNet configuration when the DNS zone is not properly attached to BYO VNet or your Hub network. Both DNS Zones are linking to the VNET which has DNS Servers enabled right? The end state is the same for both correct.
It still creates the Private DNS Zone as part of the AKS, so the issue still remains if you have custom DNS you still need to link the peering? Regardless of the pre-create Private DNS Zone
As far as I've understood the requirement for vnet link is not a bug, it's part of how the name resolution works in Azure. Pre-creating the DNS Zone allows you to also pre-create the vnet link, instead of having to rely on running bash scripts with local-exec or similar.
@viralpat218 @anttipo you are both correct.
So bash scripts with local-exec or similar would still be a requirement, as this would need to do the vNet link to the DNS Zone that is created part of the AKS cluster?
Correct.
@anttipo @viralpat218 I created a separate stage inside the pipeline in my Azure DevOps and ran AKS creation and script in parallel for the DNS Zone creation script and it worked well. null resource was'nt working for me too. Please let me know if you need further information about it.
Hey @megharaikwar, Could you please tell me that, in your AKS Script you have kept "privateDNSZone" as None and parallel executed a script that creates a DNS zone and link to the VNet, May I know, How does it work, In the post-deployment of AKS it will search for API address to resolve but in your case, it can't search.
@Musham-Aj yes. suggest using BYO Private DNS zone for such scenario:
az aks create -n <private-cluster-name> -g <private-cluster-resource-group> --enable-private-cluster --private-dns-zone <custom private dns zone ResourceId> --fqdn-subdomain <subdomain-name>
Is this option available in ARM templates, We are deploying the infrastructure using ARM Templates, So we can't go with the CLI, I see in ARM Templates that the "private-dns-zone" option is there, but could we add the resource ID in that place? rather than "none" or "system"
The AKS team uncovered issues with BYO VNet + DNS setup in AKS private clusters: https://docs.microsoft.com/en-us/azure/aks/private-clusters
In some cases, AKS private cluster creation fails for BYO VNet configuration when the DNS zone is not properly attached to BYO VNet or your Hub network. You may see messages like:
Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information
Today worker node to control plane communication requires the private DNS zone for API server name resolution.
Going forward we are working on removing private DNS zone dependency for node to control plane communication.
Current ETA for public preview is in summer 2020.
In the meantime, you may want to consider one of the workarounds:
Related issues
1508