Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

[AKS][Storage] Mount addr frequently changed to public IP #4157

Open jhayay66 opened 6 months ago

jhayay66 commented 6 months ago

Describe scenario AKS pods failed to access Azure file share with error: Host is down. Issue will be mitigated after restarting pods.

Message logs showed below error: Status code returned 0xc000006d STATUS_LOGON_FAILURE cifs_setup_session: 28 callbacks suppressed CIFS VFS:[\...) Send error in SessSetup = -13 Status code returned 0xc000006d STATUS_LOGON_FAILURE

Refer doc: https://learn.microsoft.com/zh-cn/troubleshoot/azure/azure-storage/files-troubleshoot-linux-smb#dns-account-migration This issue seems to be related with a Linux SMB kernel bug. Kernel version 5.15+ and Keyutils-1.6.2+ have the fixes.

Current AKS version: 1.20.15 Current node kennel version: 5.4

Before AKS upgrade, we use Private endpoint to mitigate this symptom. But after enabling Private endpoint, we found the Mount addr showed private IP as expected at first, but after several hours, the Mount addr changed to public IP, and host is down again. image

But from node, nslookup still returned private IP: image

Question Currently we refer https://docs.azure.cn/zh-cn/aks/azure-csi-files-storage-provision#static-provisioning-parameters Set Server to private URL to restrict access via private endpoint. image But still hope to know any possible cause for the issue, why Mount addr frequently changed to public IP? Thanks in advance

andyzhangx commented 6 months ago

hi @jhayay66 aks v1.20 is already deprecated, are you able to upgrade to aks 1.26 or higher version? the linux kernal is already 5.15.0-1057-azure on supported aks version

jhayay66 commented 6 months ago

Hi @andyzhangx Thanks for kind reply! AKS upgrade is in planning. Currently we are using Private Endpoint as a workaround. May I know if we could set volumeAttributes.server to a private IP address? Per doc https://docs.azure.cn/zh-cn/aks/azure-csi-files-storage-provision#static-provisioning-parameters, it only mentioned server URL as an acceptable value. (Align with azurefile-csi-driver Github https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/docs/driver-parameters.md) image However, comparing with csi-driver-nfs Github https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/docs/driver-parameters.md#pvpvc-usage-static-provisioning, it mentioned IP address is also acceptable. image From test result, IP address could also be configured using azurefile-csi-driver, but hope if you could help confirm this is just not mentioned in doc? Thanks very much!

andyzhangx commented 6 months ago

point is private IP address may also change though setting server parameter is supported in azure file csi driver. And yes, using private end point is another solution.

Icybiubiubiu commented 6 months ago

even we can set a static private IP, we always recommend customer upgrade their cluster to supported version. we can help little on unsupported version.

microsoft-github-policy-service[bot] commented 5 months ago

Action required from @Azure/aks-pm

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

psteinds commented 4 months ago

We had a similar problem, it turned out to be because our upstream resolver (bring your own dns) that coredns points at, would try to forward to azures internal resolvers, when this failed (UDP packet loss) it would go out to public revolvers instead and collect a public address instead of the privatelink address. There was an option in our upstream resolvers forwarding config to not fail back to public resolvers.

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 weeks ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 day ago

Issue needing attention of @Azure/aks-leads