Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

Azure Files PV AuthorizationFailure when using advanced networking #804

Closed wiebren closed 4 years ago

wiebren commented 5 years ago

Cluster created using these guides

Azure Files persistent volume claims fail with the message: persistentvolume-controller (combined from similar events): Failed to provision volume with StorageClass "azurefile": failed to create share kubernetes-dynamic-pvc-xxx in account xxx: failed to create file share, err: storage: service returned error: StatusCode=403, ErrorCode=AuthorizationFailure, ErrorMessage=This request is not authorized to perform this operation.

The service principal "AzureContainerService" is listed as owner on the storage account, so it should have access.

I first did a setup without advanced networking, in that setup the Azure Files PV without issues.

andyzhangx commented 5 years ago

@wiebren it's not related to the service principal, in k8s, it uses storage account name and key to create an azure file in that storage account, could you confirm whether the original storage account key changed or is there any restriction on the storage account? You could see your error happens here: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_storage.go#L47

wiebren commented 5 years ago

@andyzhangx It looks like that error, not sure on how to confirm the error is originating at that exact location. What I did find is that in the activity log of the storage account I do see successful "List Storage Account Keys" operations by my AKS service principal. That looks like the action on line 41 of the file you mentioned.

andyzhangx commented 5 years ago

@wiebren could you check the activity logs of the storage account? I don't think it's related to using advanced networking. cc @VybavaRamadoss @RenaShahMSFT do you know how err: storage: service returned error: StatusCode=403, ErrorCode=AuthorizationFailure, ErrorMessage=This request is not authorized to perform this operation. error message comes from azure storage?

Apparently customer could get the storage account keys, while it failed on creating azure file share by storage account key, any possible reason?

wiebren commented 5 years ago

screenshot-portal azure com-2019 02 02-10-37-12

andyzhangx commented 5 years ago

@wiebren and also paste kubectl describe pvc PVC-NAME?

andyzhangx commented 5 years ago

@wiebren could you also create a file share on that storage account manually? and to narrow down that issue, may use azure cli to create a file share on that storage account by account key

wiebren commented 5 years ago

@andyzhangx Hmm, I now also get "Access denied" when I go to the files section in the storage account. I tough i might have created the account with different settings then before. I just tried StorageV2_ZRS, StorageV2_LRS & Storage_ZRS, all the same issue

Name:          production-wiki
Namespace:     wiki
StorageClass:  azurefile
Status:        Pending
Volume:
Labels:        app=production-wiki
               chart=wiki-0.1.0
               heritage=Tiller
               release=production
Annotations:   volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-file
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
  Type     Reason              Age                        From                         Message
  ----     ------              ----                       ----                         -------
  Warning  ProvisioningFailed  3m38s (x15965 over 2d18h)  persistentvolume-controller  (combined from similar events): Failed to provision volume with StorageClass "azurefile": failed to create share kubernetes-dynamic-pvc-xxx in account xxx: failed to create file share, err: storage: service returned error: StatusCode=403, ErrorCode=AuthorizationFailure, ErrorMessage=This request is not authorized to perform this operation.
RequestId:4313d741-b01a-002d-0cdf-ba01ac000000
Time:2019-02-02T10:10:17.4087251Z, RequestInitiated=Sat, 02 Feb 2019 10:10:17 GMT, RequestId=4313d741-b01a-002d-0cdf-ba01ac000000, API Version=, QueryParameterName=, QueryParameterValue=
Mounted By:  production-54cf9f78c-22tpg
andyzhangx commented 5 years ago

@wiebren pls try with StorageV1_LRS, it should work

wiebren commented 5 years ago

@andyzhangx Storage and StorageV1 are the same thing right? It is "Storage (general purpose v1)" now

wiebren commented 5 years ago

I am beginning to think the issue might be that the storage account is created with terraform. In manually create accounts I can access the files section, in terraform created accounts I cannot. So I wanted to compare the settings by looking at the generated automation scripts however the exportTemplate call for the terraform created storage account returns with a "500 Internal Server Error"

andyzhangx commented 5 years ago

@wiebren it's Storage (general purpose v1), and you don't need to specify a storage account name, k8s will find a suitable storage account that matches skuName in the same resource group. you could use this storage class: https://github.com/andyzhangx/demo/blob/master/pv/storageclass-azurefile.yaml

andyzhangx commented 5 years ago

@wiebren StorageV2 (general purpose v2) should also work, following combination works: image

wiebren commented 5 years ago

@andyzhangx Tried some different things, seems to be network related. I tried setting "Firewalls and virtual networks" of the storage account to allow all access, instead of only my AKS subnet. Now the volume has been created but my storage account is globally accessible. Also a new issue pops up, now mounting the volume fails:

mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/xxx/volumes/kubernetes.io~azure-file/pvc-xxx --scope -- mount -t cifs -o username=xxx,password=xxx,file_mode=0777,dir_mode=0777,vers=3.0 //xxx.file.core.windows.net/kubernetes-dynamic-pvc-xxx /var/lib/kubelet/pods/xxx/volumes/kubernetes.io~azure-file/pvc-xxx
Output: Running scope as unit run-r0b092d9a65314e828042dcc3c611f0dd.scope.
mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)
andyzhangx commented 5 years ago

@wiebren could you try run mount -t cifs ... on one agent VM?

wiebren commented 5 years ago

@andyzhangx That helped, it was using a wrong access key, i suspect suff got mixed up by deleting/creating the storage account e few times. I deleted the PV/PVC's and now it mounts.

So any clue on how to get this working with restricted network access to my storage account?

andyzhangx commented 5 years ago

I think use Standard_LRS in azure file storage class would work, could you try this azure file class directly: https://github.com/andyzhangx/demo/blob/master/pv/storageclass-azurefile.yaml, it will search a matching storage account for you.

andyzhangx commented 5 years ago

And if you want to use a specified storage account, your could create a storage account as below: image

wiebren commented 5 years ago

@andyzhangx Same thing with Standard_LRS, ErrorCode=AuthorizationFailure when I only allow access from my aks subnet and when I allow all traffic it works.

andyzhangx commented 5 years ago

@wiebren Suddently I know why this only allow access from selected network for storage account does not work on AKS, that's because k8s persistentvolume-controller is on AKS master node which is not in the selected network, and that's why it could not create file share on that storage account.

And in the near future, I don't think we would support this feature: only allow access from selected network for storage account

cc @sriramkomma

andyzhangx commented 5 years ago

one workaround is use azure file static provisioning, that is create azure file share by user, and then user provide the storage account and file share in k8s, here is an example: https://docs.microsoft.com/en-us/azure/aks/azure-files-volume , I think azure file static provisioning would work on this case, while dynamic provisioning won't work

andyzhangx commented 5 years ago

By talking with azure file team, currently this create file share action is also considered as data-path operation that's the reason why it failed since create file share action happens on master node which is not in selected network, in the future, this action will not be handled as data-path operation. I will let your know when this feature is available.

krottiers commented 5 years ago

I'm experiencing exactly the same issue.

I would find it strange that the master node doesn't have access to the storage account as you can check the box that microsoft services can still access the storage account.

For me this seems like a bug. The functionality to create a dynamic PVC on a vNet integrated is something we really need. Otherwise we would need to script the creation of a file share to be able to assign a PVC when a new application is deployed.

hansverbrugge commented 5 years ago

+1 I'm facing the same issue. Whilst the workaround works, or allowing all traffic in the storage account firewall on creation and re-enable when done, this is something we really need, as we automate the creation of the storage account, including setting up access control and the firewall.

andyzhangx commented 5 years ago

cc @VybavaRamadoss @RenaShahMSFT

cmendible commented 5 years ago

Hi @andyzhangx I´m using static provisioning and having te same issue. If I allow all networks in the storage account the mount works as expected. Once I restrict the traffic, to the storage account, mounting the volumes fails with: Permission denied.

I have configured the service endpoints and even have a static public ip (from a working k8s egress service) whitelisted in the storage account's firewall but without luck.

andyzhangx commented 5 years ago

Hi @andyzhangx I´m using static provisioning and having te same issue. If I allow all networks in the storage account the mount works as expected. Once I restrict the traffic, to the storage account, mounting the volumes fails with: Permission denied.

I have configured the service endpoints and even have a static public ip (from a working k8s egress service) whitelisted in the storage account's firewall but without luck.

@cmendible have you tried manually by sudo mount -t cifs ... on that agent VM? I suppose it should work since your agent VM is already in the selected restricted network. `

cmendible commented 5 years ago

sudo mount -t cifs

Some how I missed this message. I'll have to try that an come back to you with the results...

cmendible commented 5 years ago

@andyzhangx I can confirm it's working with static provisioning. Seems that something was wrong with my SA firewall confi. I also tested with a new cluster and everything is running smooth!

andyzhangx commented 5 years ago

@RenaShahMSFT do you have timeline about when create file share action would not be considered as data-path operation?

RenaShahMSFT commented 5 years ago

@RenaShahMSFT do you have timeline about when create file share action would not be considered as data-path operation?

We are targeting this in ~Q3 CY 2019

equilibri0 commented 5 years ago

@RenaShahMSFT any updates?

samisq commented 5 years ago

Any update on this issue? We really need this too

mturzynski commented 4 years ago

I've just run into this issue. I agree that static provisioning may be a workaround but it's not really convinent.

scivm commented 4 years ago

@RenaShahMSFT Initial target was ~Q3 CY 2019. Do you know when this will get into AKS?

scivm commented 4 years ago

@andyzhangx Do you know if we can restrict access to some microsoft network or we have to leave it open to all networks until this is fixed?

andyzhangx commented 4 years ago

@RenaShahMSFT Initial target was ~Q3 CY 2019. Do you know when this will get into AKS?

@scivm Here is the working on PR: https://github.com/kubernetes/kubernetes/pull/90350, would be fixed in k8s v1.19. And also you could try https://github.com/kubernetes-sigs/azurefile-csi-driver, since controller component is in AKS node resource group, it won't have this issue now.

andyzhangx commented 4 years ago

This issue is already fixed in k8s v1.19.0 and latest azure file CSI driver.

andyzhangx commented 4 years ago

close this issue since I have verified that this issue is fixed in AKS 1.19.0 and also with azure file CSI driver by using management API, while data plane API(using account key directly) is still blocked due to the limitation.