Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

Copying file into a directory in pod linked to Azure File Share more than two times using kubectl cp throws error #1292

Closed tblazina closed 4 years ago

tblazina commented 4 years ago

What happened: I have copied a local file to a directory in a pod which is VolumeMount of a PVC linked to an Azure File Share using kubectl cp some_file <some-namespace>/<some-pod>:/volume_mounted/directory/in/pod, changed the file once copied it again, and changed and copied it a third time and received the following error:

tar: k8s_example2.py: Cannot open: No such file or directory
tar: Exiting with failure status due to previous errors
command terminated with exit code 2

After this if I exec into the container and go to where the file should be, the file shows up when i run ls but if I try to do anything to the file (e.g. rm k8s_example2.py, cat k8s_example2.py, head k8s_example2.py, etc.), I get an error telling me the file does not exist (e.g. rm: cannot remove 'k8s_example2.py': No such file or directory).

When I run ls -la the file has the following info:

-rwxrwxrwx 1 airflow airflow 7135 Oct 30 16:13 k8s_example2.py

What you expected to happen: I would expect that the file could be copied repeatedly.

How to reproduce it (as minimally and precisely as possible):

  1. Create a storage class with
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
    name: azurefile
    provisioner: kubernetes.io/azure-file
    mountOptions:
    - dir_mode=0777
    - file_mode=0777
    parameters:
    storageAccount: <some-storage-account>
    location: west-europe
    resourceGroup: <some-resource-group>
  2. Create a PV with
    kind: PersistentVolume
    metadata:
    name: <some-volume-name>
    namespace: <some-namespace>
    spec:
    capacity:
    storage: 10Gi
    accessModes:
    - ReadWriteOnce
    storageClassName: azurefile
    azureFile:
    secretName: <a-secret-name>
    shareName: <some-share-name>
    readOnly: false
    mountOptions:
    - dir_mode=0777
    - file_mode=0777
    - uid=1000
    - gid=1000
  3. Create a PVC with:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: <some-pvc-name>
    namespace: <some-namespace>
    spec:
    accessModes:
    - ReadWriteMany
    resources:
    requests:
     storage: 5Gi
    storageClassName: azurefile
    volumeName: <some-volume-name>
  4. Create a pod with a Volume from the created PVC and a volumeMount using this Volume
  5. Copy a file multiple times to the mount directory of the volumeMount using kubectl cp Anything else we need to know?:

Also if I exec into the container and run dmesg | tail -n100 I see [446995.438278] CIFS VFS: ioctl error in smb2_get_dfs_refer rc=-5 repeated many times, Maybe this is related to #1030?

Environment:

carbolymer commented 4 years ago

I'm having the same issue with one pod running git-sync on azure-files mounted volume. @tblazina have you found any workaround for this problem?

andyzhangx commented 4 years ago

one workaround could be delete that pod and azure file volume would be remounted.

tblazina commented 4 years ago

@carbolymer, unfortunately I have not found a workaround. What appears to be happening is that after 2 times of executing the copy, AFS marks it as having a file conflict.

andyzhangx commented 4 years ago

@carbolymer, unfortunately I have not found a workaround. What appears to be happening is that after 2 times of executing the copy, AFS marks it as having a file conflict.

pls file a support ticket to azure file team, thanks.

andyzhangx commented 4 years ago

pls follow this guide to delete that azure file in problem: https://docs.microsoft.com/en-us/azure/storage/files/storage-troubleshoot-windows-file-connection-problems#unable-to-delete-a-file-or-directory-in-an-azure-file-share Let me know if you have any question.

tblazina commented 4 years ago

Great, I'll check out the link. thanks for the help @andyzhangx!

marrobi commented 4 years ago

@tblazina I had a similar issue, and found out:

"Version 4.15.0-1063 of the Ubuntu kernel doesn’t have the fix for the handle leak problem related to renaming files. The fix went into version 4.15.0-1064. Please upgrade your kernel to any version starting from that one to see if it resolves the issue in your scenario. "

Once I had a AKS cluster with the new kernel the problem disappeared. Be good to know if works for you too.