Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
643 stars 203 forks source link

Permission and Ownership Issues with Blobfuse Mounts in Non-Root AKS Pods #1441

Open ioan02 opened 1 week ago

ioan02 commented 1 week ago

We are utilizing Blobfuse to mount blob containers as volumes in our AKS pods. For our main use case, the mounted filesystem behaves as expected when owned by root with the allow_other option set. This configuration allows any user the pod runs as to access the filesystem. However, this becomes problematic when we need the filesystem to be owned by another user. Indeed we intend to expand our use cases, such as employing a container as the storage backend for a PostgreSQL database instead of a disk. This is particularly for scenarios where the PostgreSQL instance is not heavily utilized, such as a metastore for applications like Superset.

Issue Description We are encountering issues with permissions and ownership on the mounted filesystem. Blobfuse allows setting gid, uid, and umask, and we observe the intended behavior when the container runs as root (at least the ownership seems correct). However, this does not hold when the container runs as a non-root user , which defeats the purpose of setting these variables. Notably, even when the container runs as root, attempting to switch users results in permission errors and ????? entries, indicating access control issues.

Our current use case involves PostgreSQL, but we are interested in the more general approach for managing permissions and ownership on the mounted filesystem, and the correct usage of gid, uid, and umask options.

Which version of blobfuse was used?

blob-csi-driver 1.23.4 (server-side Blobfuse version unknown)

Which OS distribution and version are you using?

AKS 1.29.2 Ubuntu

If relevant, please share your mount command.

  mountOptions:
  - --use-attr-cache=false
  - --disable-writeback-cache=true
  - -o umask=0077
  - -o uid=1001
  - -o gid=1001
  - --cancel-list-on-mount-seconds=10
  - --log-level=LOG_DEBUG
  - --max-blocks-per-file=2
  - -o negative_timeout=120
  - --streaming=true
  - --stream-cache-mb=100
  - --block-size-mb=50
  - -o entry_timeout=120

What was the issue encountered?

When the pod is running as root:

 root@superset-db-0:/# ls /bitnami/postgresql/data -l
total 0
drwxrwxr-- 2 1001 1001 4096 Jun 28 14:56 base
drwxrwxr-- 2 1001 1001 4096 Jun 28 14:56 global
drwxrwxr-- 2 1001 1001 4096 Jun 28 14:56 pg_commit_ts
drwxrwxr-- 2 1001 1001 4096 Jun 28 14:56 pg_dynshmem
-rwxrwxr-- 1 1001 1001 2640 Jun 28 14:56 pg_ident.conf
drwxrwxr-- 2 1001 1001 4096 Jun 28 14:56 pg_logical

When running as a non-root user, including the user 1001:

I have no name!@superset-db-0:/$ ls /bitnami/ -l
ls: cannot access '/bitnami/postgresql': Permission denied
total 0
d????????? ? ? ? ?            ? postgresql

Have you found a mitigation/solution?

We have not found a mitigation or solution.

Please share logs if available.

Blobfuse logs indicate that the mount itself is successful.

vibhansa-msft commented 1 week ago

uid/gid that you set helps you see the ownership of the files when you list/access them. However, this works only when allow_other is provided as input, which means other users can potentially still access the file-system. If you mount using 'sudo' then base ownership is still with sudo user while uid/gid helps you see the file ownership. Behaviour comes from libfuse itself as blobfuse as a file-system can just return uid/gid/permissions to the kernel for each file/directory.

ioan02 commented 1 week ago

Thank you for the explanation, the allow_other option was indeed the missing piece.

We are now hitting different errors, and before we go further down this debugging road, i'd like to ask your opinion on what we are trying to accomplish here; that is using fuse as a storage backend for databases (albeit modest sized and incurring modest workloads). We know that in theory this should be doable, although unorthodox. But is it? And if it is, would it be reliable? We have not been able to find documentation anywhere, so any light you could shed on this would be appreciated.

For the sake of completeness, here is the error, quite eloquently indicating an issue with our FUSE-mounted filesystem:

running bootstrap script ... 2024-07-01 17:50:13.391 UTC [830] FATAL:  unexpected data beyond EOF in block 0 of relation base/1/1255
2024-07-01 17:50:13.391 UTC [830] HINT:  This has been seen to occur with buggy kernels; consider updating your system.
2024-07-01 17:50:13.391 UTC [830] PANIC:  cannot abort transaction 1, it was already committed
vibhansa-msft commented 1 week ago