Open dariusj1 opened 3 months ago
Hey @dariusj1, thanks for reporting this issue with plenty of detail. You confirmed that the CSI driver did mount the file system and that the Mountpoint process is still running - that's great to confirm.
We need to learn more about what Mountpoint itself was doing and why we're seeing question marks when trying to interact with that FS.
Please can you fetch and share the logs from Mountpoint itself? You can learn more about how to fetch those in Mountpoint CSI Driver's logging documentation. If you're running the workload again, it would be useful to include debug
as a mount option in the persistent volume spec.
I would consider IAM issues if I could not access the bucket using awscli from that namespace using the CSI SA, but since I CAN, I doubt the AWS Support would be able to advice.
Mountpoint's CSI driver (and Mountpoint) is backed by AWS Support, so please don't hesitate to reach out to them in future should you wish to do so.
@dariusj1 I just noticed this:
I see lots of question marks and a Permission denied.
I see that you're running under user app_runner
. Does the issue go away when running as root?
I wonder if this is a case of needing to account for running the container application under a different user. We have an example spec file for that here: https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/non_root.yaml
It would be great if you could share your PV, PVC, and pod spec as well to understand a bit more. (Feel free to redact if needed)
@dannycjones While in the host's filesystem, I created a test file in the /var/lib/kubelet/pods/f8122c23-8cec-40a7-9a82-1914c1c84ed2/volumes/kubernetes.io~csi/pv-hdump/mount directory. I only checked it just now, but the test file appeared in the S3 when I checked through the AWS console.
And now, as per your advice, I created a standalone pod running with root
privileges, configured to mount that exact same S3 pvc. It seems that I can indeed see the created test file in the mounted /hdump directory!!
So it's a permission issue then...? I wonder why there's no error anywhere stating that; did I miss it...?
In the yaml you've referred me to, is this the part I'm missing then?
mountOptions:
- uid=1000
- gid=2000
- allow-other
It would be great if you could share your PV, PVC, and pod spec as well to understand a bit more. (Feel free to redact if needed)
Is this still needed?
And now, as per your advice, I created a standalone pod running with
root
privileges, configured to mount that exact same S3 pvc. It seems that I can indeed see the created test file in the mounted /hdump directory!!So it's a permission issue then...? I wonder why there's no error anywhere stating that; did I miss it...?
In the yaml you've referred me to, is this the part I'm missing then?
mountOptions: - uid=1000 - gid=2000 - allow-other
Yes, it seems like a permission issue - specifically at the Linux filesystem level.
By default, Mountpoint will have files with permission bits 0644 allowing reading and writing for the user running Mountpoint, and supposedly reading to others. There's a caveat here though that at the FS/FUSE level, we need to additionally 'allow other users' which includes specifying allow-other
which is a Linux mount option. I believe you may also need to enable this in the OS configuration in /etc/fuse.conf
, by adding user_allow_other
in a new line if not already present. Let me know if that's needed. I will find a way to work the scenario in this ticket into documentation/troubleshooting.
We document the behavior with "allow other" in Mountpoint's config docs: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#file-and-directory-permissions
Specifying the correct UID makes sure that the write permissions are granted to the correct user, so you should ensure that matches the UID used in the container.
Effectively, you need both of these sections:
It would be great if you could share your PV, PVC, and pod spec as well to understand a bit more. (Feel free to redact if needed)
Is this still needed?
I think we understand the issue now based on your testing. If you are happy to share it though once you've got a final working solution, it may be useful for anyone coming across this issue in future.
@dannycjones
I've just tested and can confirm that adding
mountOptions:
- uid=1000
- gid=2000
- allow-other
resolved my access issue, thank you. What threw mw off is that the kubernetes pod spec does NOT specify securityContext. Instead, it's enforced at the OCI image level.
A few followup questions:
NodeUnpublishVolume
part in the logs as "detaching the volume". What's it mean?uid=1000
and gid=2000
and allow-other
mountOptions, does that mean that containers running as, say, uid=5000 won't be able to write to that volume? Is there any way to bypass this restriction (the mounted directory mode is 0755 and files are 0644) and reuse the same s3 in containers running as different UIDs?
- does that mean there actualy were no errors mounting the volume? Is that why I didn't see any?
Exactly, the volume was attached correctly and Mountpoint was running as expected. The error occurred at the Kernel level, as it will reject requests coming in for that FUSE file system where the user does not match the one running Mountpoint itself (without using --allow-other
).
- I must've misinterpreted the
NodeUnpublishVolume
part in the logs as "detaching the volume". What's it mean?
NodeUnpublishVolume
is called by Kubernetes when we should detach that volume. For this CSI driver, that means unmounting the Mountpoint file system and cleaning up any other things related to the volume.
If you were able to jump on the node and access the FS, I'm not sure what's happened here. I'd expect that the file system should no longer be mounted and that directory to just be empty.
- if I specify
uid=1000
andgid=2000
andallow-other
mountOptions, does that mean that containers running as, say, uid=5000 won't be able to write to that volume? Is there any way to bypass this restriction (the mounted directory mode is 0755 and files are 0644) and reuse the same s3 in containers running as different UIDs?
Yeah, you can update the file and directory modes. For example, you could include the mount options file-mode=0664
and dir-mode=0775
to grant write access to the group also. There's more explanation here: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#file-and-directory-permissions.
What threw mw off is that the kubernetes pod spec does NOT specify securityContext. Instead, it's enforced at the OCI image level.
Forgive me, I'm still new to Kubernetes. You're sayings that your pod spec wasn't already specifying the runAs fields, but instead relying on something like the USER
field in the Dockerfile/Containerfile? https://docs.docker.com/reference/dockerfile/#user
What threw mw off is that the kubernetes pod spec does NOT specify securityContext. Instead, it's enforced at the OCI image level.
Forgive me, I'm still new to Kubernetes. You're sayings that your pod spec wasn't already specifying the runAs fields, but instead relying on something like the
USER
field in the Dockerfile/Containerfile? https://docs.docker.com/reference/dockerfile/#user
Yes, that is the case. The OCI image is generated using vendor's shell scripts. I've just checked what's in there:
## ...
RUN adduser -u 10001 --user-group app_runner
## ...
USER app_runner
## ...
There's more explanation here: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#file-and-directory-permissions.
thank you!!
Hello,
I'm not sure whether this is aws-support-worthy, because from the aws services' perspective it should be working...
So, I installed the s3 CSI driver in my EKS cluster, created IAM roles as instructed in the README.md. Now, if I spin up a dummy pod in my kube-system namespace using my s3-csi-driver-sa, I can acess my S3 buckets using awscli just fine. I can create objects, delete, them, list them, etc. If I try to mount the same bucket using the s3-csi-driver, I'm getting no errors! The PV and PVC are created just fine, the pod referring to the PVC starts up too. However, if I try to
ls -l
the mounted directory, I see lots of question marks and a Permission denied.If I see into the CSI driver's logs, I see that the volume has been mounted for no more than 1 second, and then unmounted (?). No errors whatsoever.
If I break into the EC2 node running the csi driver's pod, I can see the process is there:
(s3 bucket name redacted) /var/lib/kubelet/pods/f8122c23-8cec-40a7-9a82-1914c1c84ed2/volumes/kubernetes.io~csi/pv-hdump/mount directory is there, but it's empty. And it does not reflect any files I uploaded to that s3 manually, through the AWS Console.
I would consider IAM issues if I could not access the bucket using awscli from that namespace using the CSI SA, but since I CAN, I doubt the AWS Support would be able to advice.
Can you?
/triage support