awslabs / mountpoint-s3-csi-driver

Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an Amazon S3 bucket as a storage volume accessible by containers in your Kubernetes cluster.
Apache License 2.0
193 stars 23 forks source link

Cannot write when using `allow-other` MP option #142

Open cazter opened 8 months ago

cazter commented 8 months ago

/kind bug

What happened? Unable to write. Currently testing with AWS IAM role that has all s3 action permissions on the bucket being used by the EKS Mountpoint S3 addon.

/datas3_us/live/pg-manager/pg_wal/spilo/****-*****10140$ tar -czvf archive_name.tar.gz 13edd11c-7e37-4b11-b54d-c8308013957d/
13edd11c-7e37-4b11-b54d-c8308013957d/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/basebackups_005/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/basebackups_005/base_00000001000000000000000D_00000040/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/basebackups_005/base_00000001000000000000000D_00000040/extended_version.txt
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/basebackups_005/base_00000001000000000000000D_00000040/tar_partitions/
13edd11c-7e37-4b11-b54d-c8308013957d/wal/11/basebackups_005/base_00000001000000000000000D_00000040/tar_partitions/part_00000000.tar.lzo

gzip: stdout: Input/output error
tar: archive_name.tar.gz: Cannot write: Broken pipe
tar: Child returned status 1
tar: Error is not recoverable: exiting now

What you expected to happen? Zip a large directory recursively to a zip file within the S3 bucket. How to reproduce it (as minimally and precisely as possible)? see above Anything else we need to know?:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-eks-us-pv
  namespace: ****
spec:
  capacity:
    storage: 1200Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - allow-delete
    - allow-other
    - region us-east-1
    - uid=1000
    - gid=1000
  csi:
    driver: s3.csi.aws.com
    volumeHandle: eks-logging-****-volume
    volumeAttributes:
      bucketName: eks-logging-****
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: s3-eks-us-pvc
  namespace: ****
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 1200Gi
  volumeName: s3-eks-us-pv

Environment

jjkr commented 7 months ago

I have tried creating several tarballs in different configurations including one similar to the one you have here and am not able to reproduce this. My first thought was tar is doing some filesystem operations that mountpoint does not support (this doc has more details), but given I can't reproduce it there might be something else going on.

Some more information that would be helpful here:

cazter commented 7 months ago

I have tried creating several tarballs in different configurations including one similar to the one you have here and am not able to reproduce this. My first thought was tar is doing some filesystem operations that mountpoint does not support (this doc has more details), but given I can't reproduce it there might be something else going on.

Some more information that would be helpful here:

  • Relevant logs from the driver container (kubectl logs -l app=s3-csi-node --namespace kube-system)
  • Relevant logs from mountpoint (from the underlying host's syslog journalctl -e SYSLOG_IDENTIFIER=mount-s3 and mountpoint has some additional documentation here)
  • What OS and major version your nodes are running?
  • Do all subsequent reads and writes fail after you see this error?

I revised the PV yaml to include - debug.

Read works and continues to work. Writes of any kind don't work--in fact, they may have never worked. I originally tested with touch test, which seems to work but in fact no file is created.

cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
kubectl logs -l app=s3-csi-node --namespace kube-system -c s3-plugin
I0202 02:13:09.516061       1 node.go:204] NodeGetInfo: called with args
I0202 02:12:38.716005       1 driver.go:60] Driver version: 1.2.0, Git commit: 8a832dc5e2fcaa01c02bece33c09517b5364687a, build date: 2024-01-17T16:52:48Z, nodeID: ip-10-0-1-10.ec2.internal, mount-s3 version: 1.3.2
I0202 02:12:38.719185       1 mount_linux.go:285] 'umount /tmp/kubelet-detect-safe-umount733122656' failed with: exit status 32, output: umount: /tmp/kubelet-detect-safe-umount733122656: must be superuser to unmount.
I0202 02:12:38.719234       1 mount_linux.go:287] Detected umount with unsafe 'not mounted' behavior
I0202 02:12:38.719315       1 driver.go:80] Found AWS_WEB_IDENTITY_TOKEN_FILE, syncing token
I0202 02:12:38.719601       1 driver.go:110] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
kubectl logs -l app=s3-csi-node --namespace kube-system -c node-driver-registrar
I0202 02:11:48.769131       1 driver.go:60] Driver version: 1.2.0, Git commit: 8a832dc5e2fcaa01c02bece33c09517b5364687a, build date: 2024-01-17T16:52:48Z, nodeID: ip-10-0-21-67.ec2.internal, mount-s3 version: 1.3.2
I0202 02:11:48.772261       1 mount_linux.go:285] 'umount /tmp/kubelet-detect-safe-umount386038248' failed with: exit status 32, output: umount: /tmp/kubelet-detect-safe-umount386038248: must be superuser to unmount.
I0202 02:11:48.772276       1 mount_linux.go:287] Detected umount with unsafe 'not mounted' behavior
I0202 02:11:48.772323       1 driver.go:80] Found AWS_WEB_IDENTITY_TOKEN_FILE, syncing token
I0202 02:11:48.772527       1 driver.go:110] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0202 02:11:49.544914       1 node.go:204] NodeGetInfo: called with args
journalctl -e SYSLOG_IDENTIFIER=mount-s3
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] log: FUSE(132) ino 0x000000000000000a RELEASEDIR fh FileHandle(4), flags 0x28800, flush false, lock owner None
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] log: FUSE(138) ino 0x0000000000000007 RELEASEDIR fh FileHandle(1), flags 0x28800, flush false, lock owner None
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] log: FUSE(136) ino 0x0000000000000008 RELEASEDIR fh FileHandle(2), flags 0x28800, flush false, lock owner None
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] mountpoint_s3::fuse::session: starting fuse worker 4 (thread id 3174561)
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] read{req=112 ino=29 fh=9 offset=507904 size=131072 name=part_00000000.tar.lzo}:prefetch{range=1179648..8388608 out of 19317583}:get_object{id=54 bucket=eks-************ key=live/pg-manager/pg_wal/spilo/****-************/13edd11
Feb 08 18:14:40 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [DEBUG] read{req=112 ino=29 fh=9 offset=507904 size=131072 name=part_00000000.tar.lzo}:prefetch{range=1179648..8388608 out of 19317583}:get_object{id=54 bucket=eks-************ key=live/pg-manager/pg_wal/spilo/****-************/13edd11
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.io_size[type=read]: n=8: min=131 p10=131 p50=66047 avg=80200.38 p90=132095 p99=132095 p99.9=132095 max=132095
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=create]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=flush]: 2 (n=2)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=getxattr]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=ioctl]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=lookup]: 2 (n=2)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_failures[op=write]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=create]: n=1: min=20 p10=20 p50=20 avg=20.00 p90=20 p99=20 p99.9=20 max=20
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=flush]: n=5: min=10 p10=10 p50=14 avg=21.40 p90=36 p99=36 p99.9=36 max=36
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=getattr]: n=2: min=16 p10=16 p50=16 avg=30280.00 p90=60671 p99=60671 p99.9=60671 max=60671
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=getxattr]: n=1: min=19 p10=19 p50=19 avg=19.00 p90=19 p99=19 p99.9=19 max=19
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=ioctl]: n=1: min=25 p10=25 p50=25 avg=25.00 p90=25 p99=25 p99.9=25 max=25
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=lookup]: n=8: min=32384 p10=32511 p50=46847 avg=53784.00 p90=102911 p99=102911 p99.9=102911 max=102911
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=mknod]: n=1: min=86016 p10=86527 p50=86527 avg=86272.00 p90=86527 p99=86527 p99.9=86527 max=86527
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=open]: n=3: min=51200 p10=51455 p50=67071 avg=64554.67 p90=75775 p99=75775 p99.9=75775 max=75775
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=opendir]: n=6: min=9 p10=9 p50=14 avg=61.00 p90=301 p99=301 p99.9=301 max=301
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=read]: n=8: min=41 p10=41 p50=169 avg=68364.88 p90=344063 p99=344063 p99.9=344063 max=344063
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=readdirplus]: n=12: min=12 p10=15 p50=18 avg=12546.50 p90=21631 p99=53247 p99.9=53247 max=53247
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=release]: n=3: min=14 p10=14 p50=20 avg=60.00 p90=146 p99=146 p99.9=146 max=146
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=releasedir]: n=6: min=4 p10=4 p50=6 avg=8.50 p90=16 p99=16 p99.9=16 max=16
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_latency_us[op=write]: n=1: min=71 p10=71 p50=71 avg=71.00 p90=71 p99=71 p99.9=71 max=71
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_unimplemented[op=create]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_unimplemented[op=getxattr]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.op_unimplemented[op=ioctl]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.readdirplus.entries: 34 (n=12)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: fuse.total_bytes[type=read]: 639107 (n=8)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: prefetch.contiguous_read_len: n=2: min=131 p10=131 p50=131 avg=320577.50 p90=643071 p99=643071 p99.9=643071 max=643071
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: prefetch.part_queue_starved_us: n=2: min=20224 p10=20351 p50=20351 avg=101536.00 p90=183295 p99=183295 p99.9=183295 max=183295
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_auto_default_network_io: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_auto_ranged_copy_network_io: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_auto_ranged_get_network_io: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_auto_ranged_put_network_io: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_requests_being_prepared: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_requests_being_processed: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_requests_stream_queued_waiting: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_requests_streaming_response: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.num_total_network_io: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.client.request_queue_size: 0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.failures[op=head_object,status=404]: 11 (n=11)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.failures[op=put_object,status=400]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.first_byte_latency_us[op=get_object]: n=3: min=17920 p10=18047 p50=181247 avg=178538.67 p90=337919 p99=337919 p99.9=337919 max=337919
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.first_byte_latency_us[op=head_object]: n=13: min=7808 p10=8191 p50=12223 avg=24598.15 p90=59135 p99=101887 p99.9=101887 max=101887
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.first_byte_latency_us[op=list_objects]: n=19: min=14400 p10=15743 p50=42751 avg=39814.74 p90=63231 p99=83455 p99.9=83455 max=83455
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.first_byte_latency_us[op=put_object]: n=1: min=56064 p10=56319 p50=56319 avg=56192.00 p90=56319 p99=56319 p99.9=56319 max=56319
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.throughput_mibs[op=get_object,size=1-16MiB]: n=2: min=6 p10=6 p50=6 avg=13.00 p90=20 p99=20 p99.9=20 max=20
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.throughput_mibs[op=get_object,size=<1MiB]: n=1: min=0 p10=0 p50=0 avg=0.00 p90=0 p99=0 p99.9=0 max=0
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.total_latency_us[op=get_object]: n=3: min=18048 p10=18175 p50=181247 avg=179946.67 p90=342015 p99=342015 p99.9=342015 max=342015
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.total_latency_us[op=head_object]: n=13: min=7808 p10=8191 p50=12223 avg=24598.15 p90=59135 p99=101887 p99.9=101887 max=101887
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.total_latency_us[op=list_objects]: n=19: min=14464 p10=15807 p50=43007 avg=39902.32 p90=63231 p99=83455 p99.9=83455 max=83455
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests.total_latency_us[op=put_object]: n=1: min=56064 p10=56319 p50=56319 avg=56192.00 p90=56319 p99=56319 p99.9=56319 max=56319
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests[op=get_object]: 3 (n=3)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests[op=head_object]: 13 (n=13)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests[op=list_objects]: 19 (n=19)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.meta_requests[op=put_object]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.failures[op=head_object,type=Default,status=404]: 11 (n=11)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.failures[op=put_object,type=CreateMultipartUpload,status=400]: 1
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.first_byte_latency_us[op=get_object,type=Default]: n=3: min=17792 p10=17919 p50=106495 avg=92736.00 p90=154623 p99=154623 p99.9=154623 max=154623
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.first_byte_latency_us[op=head_object,type=Default]: n=13: min=7488 p10=7871 p50=11967 avg=23357.54 p90=53759 p99=97791 p99.9=97791 max=97791
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.first_byte_latency_us[op=list_objects,type=Default]: n=19: min=13248 p10=14591 p50=37631 avg=35516.63 p90=56575 p99=79359 p99.9=79359 max=79359
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.first_byte_latency_us[op=put_object,type=CreateMultipartUpload]: n=1: min=55808 p10=56063 p50=56063 avg=55936.00 p90=56063 p99=56063 p99.9=56063 max=56063
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.total_latency_us[op=get_object,type=Default]: n=3: min=17920 p10=18047 p50=181247 avg=179904.00 p90=342015 p99=342015 p99.9=342015 max=342015
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.total_latency_us[op=head_object,type=Default]: n=13: min=7680 p10=8063 p50=12095 avg=24450.46 p90=58879 p99=101375 p99.9=101375 max=101375
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.total_latency_us[op=list_objects,type=Default]: n=19: min=14400 p10=15743 p50=42751 avg=39760.84 p90=63231 p99=82943 p99.9=82943 max=82943
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests.total_latency_us[op=put_object,type=CreateMultipartUpload]: n=1: min=56064 p10=56319 p50=56319 avg=56192.00 p90=56319 p99=56319 p99.9=56319 max=56319
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests[op=get_object,type=Default]: 3 (n=3)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests[op=head_object,type=Default]: 13 (n=13)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests[op=list_objects,type=Default]: 19 (n=19)
Feb 08 18:14:42 ip-10-0-52-212.ec2.internal mount-s3[3170859]: [INFO] mountpoint_s3::metrics: s3.requests[op=put_object,type=CreateMultipartUpload]: 1
cazter commented 7 months ago

From the role being used by the addon driver within EKS, here is the json from the policy applied to the role (full permissions while troubleshooting).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "MountpointFullBucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::*",
                "arn:aws:s3:::*/*"
            ]
        },
        {
            "Sid": "MountpointFullObjectAccess",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::*",
                "arn:aws:s3:::*/*"
            ]
        }
    ]
}

Trust policy, with some masking added.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::*********:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/81E864EC5D407DE3A80E**************"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "oidc.eks.us-east-1.amazonaws.com/id/81E864EC5D407DE3A80E**************:aud": "sts.amazonaws.com",
                    "oidc.eks.us-east-1.amazonaws.com/id/81E864EC5D407DE3A80E**************:sub": "system:serviceaccount:kube-system:s3-csi-*"
                }
            }
        }
    ]
}
dannycjones commented 7 months ago

Just replied on the MP issue - we're a little limited on information: https://github.com/awslabs/mountpoint-s3/issues/738#issuecomment-1938604116

We only see the tail-end of the logs.

@cazter It would be great if you could get more logs using one of the following:

@jjkr let us know if there's a better way to pull this information from the syslog.

pch05 commented 7 months ago

Hello,

It seems I encountered a similar issue: I've mounted my s3 bucket with this command: mount-s3 <bucket_name> <directory_to_associate>

It works and I can list file and repositories on bucket from my instance. But when I want to do 'cat' command, for example, to one of this files, I have this issue: cat: <filename>: Input/output error

If I try to get the file on my laptop with aws s3 command, It works and I can read the content of file.

This is the policy I've applied to my instance to access bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:*"
            ],
            "Effect": "Allow",
            "Resource": "<bucket_arn>"
        }
    ]
}

I hope my question helps and is in the right place. Thank you

cazter commented 7 months ago
  • journalctl -e SYSLOG_IDENTIFIER=mount-s3 --boot

We've updated the eks mountpoint addon to 1.3.0.

We've been troubleshooting using pod manifest that picks between 1/4 different ec2 hosts. But the vast majority of the time a particular host has been picked by kube scheduler--the below logs (too large to paste), are from that host.

journalctl -e SYSLOG_IDENTIFIER=mount-s3 --boot > boot.txt

boot.txt

journalctl -e SYSLOG_IDENTIFIER=mount-s3 --since "2024-02-08 18:00:00" > bydate.txt

bydate.txt

cazter commented 7 months ago

We were able to get write working using a test pod running python:3.8-slim--more specifically, a container that defaults root to uid=0 and gid=0. Which allowed us to remove the --allow-other flag. It's this flag that does not appear to be working as expected. See below for more details on this.

The pod we hope to this working within runs the Jenkins inbound-agent. This defaults the root user to gid=1000.

root@pod-cicd-s3-b8lxs:/home/jenkins# id
uid=0(root) gid=1000(jenkins) groups=1000(jenkins)
root@pod-cicd-s3-b8lxs:/home/jenkins# su jenkins
$ id
uid=1000(jenkins) gid=1000(jenkins) groups=1000(jenkins)
$ exit
root@pod-cicd-s3-b8lxs:/home/jenkins# cat /etc/group
root:x:0:
daemon:x:1:
bin:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mail:x:8:
news:x:9:
uucp:x:10:
man:x:12:
proxy:x:13:
kmem:x:15:
dialout:x:20:
fax:x:21:
voice:x:22:
cdrom:x:24:
floppy:x:25:
tape:x:26:
sudo:x:27:
audio:x:29:
dip:x:30:
www-data:x:33:
backup:x:34:
operator:x:37:
list:x:38:
irc:x:39:
src:x:40:
gnats:x:41:
shadow:x:42:
utmp:x:43:
video:x:44:
sasl:x:45:
plugdev:x:46:
staff:x:50:
games:x:60:
users:x:100:
nogroup:x:65534:
jenkins:x:1000:
ssh:x:101:

We've tested the PV with various uid and gid for both root and jenkins users, including root set at both 0 and 0. The volumes always mount so long as the --allow-other flag is present... but if we remove the flag the volume will fail to mount.

We've added the following to the jenkins agent manifest.

  securityContext:
    runAsUser: 0
    runAsGroup: 1000

If we rebuild the inbount-agent docker to change the root user default group back to 0 as follows:

USER root
RUN usermod -g 0 root

Then the pod fails to connect to jenkins, failing healthcheck so we're unable to test the s3 mounts--but I am certain they would work.

In short, --allow-other and/or the flags for defining gid and uid are not working.

cazter commented 7 months ago

We were able to get this working for our use-case by adjusting the jenkins agent to run as uid=0, gid=0 and removing the flags for --allow-other and uid, gid. Working PV config:

metadata:
  name: s3-eks-us-pv
  namespace: jenkins
spec:
  capacity:
    storage: 1200Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - allow-delete
    - region us-east-1
  csi:
    driver: s3.csi.aws.com
    volumeHandle: eks-logging-*****-volume
    volumeAttributes:
      bucketName: eks-logging-*****

I did not close the issue as we believe there is still a bug with --allow-other. However, we've managed a workaround so you're welcome to close.

dannycjones commented 7 months ago

A short note on --allow-other and --allow-root: it requires the FUSE feature to be explicitly enabled by setting user_allow_other in /etc/fuse.conf (or similar location). I am not aware if this is the case for the CSI driver.

lgy1027 commented 7 months ago

Hello, I also encountered the problem of being unable to write. When creating a file in the container, there is no file in the minio bucket. My configuration is as follows

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  capacity:
    storage: 1200Gi # ignored, required
  accessModes:
    - ReadWriteMany # supported options: ReadWriteMany / ReadOnlyMany
  mountOptions:
    - allow-delete
    - allow-overwrite
    - allow-other
    - endpoint-url http://10.0.102.45:32001
  csi:
    driver: s3.csi.aws.com # required
    volumeHandle: s3-csi-driver-volume
    volumeAttributes:
      bucketName: test
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: s3-claim
spec:
  accessModes:
    - ReadWriteMany # supported options: ReadWriteMany / ReadOnlyMany
  storageClassName: "" # required for static provisioning
  resources:
    requests:
      storage: 1200Gi # ignored, required
  volumeName: s3-pv
---
apiVersion: v1
kind: Pod
metadata:
  name: s3-app
spec:
  containers:
    - name: app
      image: nginx
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: s3-claim
        readOnly: false

logs:

Feb 21 17:01:22 yigou-dev-102-44 mount-s3[23613]: [WARN] open{req=122 ino=10 pid=82653 name=minio.txt}:put_object{id=38 bucket=test key=minio.txt}: mountpoint_s3_client::s3_crt_client: meta request failed duration=12.883365ms request_result=MetaRequestResult { response_status: 400, crt_error: Error(14343, "aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request"), error_response_headers: Some(Headers { inner: 0x7fdf60012810 }), error_response_body: Some("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>XAmzContentSHA256Mismatch</Code><Message>The provided &#39;x-amz-content-sha256&#39; header does not match what was computed.</Message><Key>minio.txt</Key><BucketName>test</BucketName><Resource>/test/minio.txt</Resource><RequestId>17B5D54057EF17CD</RequestId><HostId>b7190f0a-016d-4608-a046-8d37d0c6c743</HostId></Error>") }

Feb 21 17:01:22 yigou-dev-102-44 mount-s3[23613]: [WARN] release{req=130 ino=10 fh=10 name=minio.txt}: mountpoint_s3::fuse: release failed: put failed: Client error: Unknown response error: MetaRequestResult { response_status: 400, crt_error: Error(14343, "aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request"), error_response_headers: Some(Headers { inner: 0x7fdf60012810 }), error_response_body: Some("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>XAmzContentSHA256Mismatch</Code><Message>The provided &#39;x-amz-content-sha256&#39; header does not match what was computed.</Message><Key>minio.txt</Key><BucketName>test</BucketName><Resource>/test/minio.txt</Resource><RequestId>17B5D54057EF17CD</RequestId><HostId>b7190f0a-016d-4608-a046-8d37d0c6c743</HostId></Error>") }
dannycjones commented 7 months ago

Hello, I also encountered the problem of being unable to write. When creating a file in the container, there is no file in the minio bucket.

////

logs:

Feb 21 17:01:22 yigou-dev-102-44 mount-s3[23613]: [WARN] open{req=122 ino=10 pid=82653 name=minio.txt}:put_object{id=38 bucket=test key=minio.txt}: mountpoint_s3_client::s3_crt_client: meta request failed duration=12.883365ms request_result=MetaRequestResult { response_status: 400, crt_error: Error(14343, "aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request"), error_response_headers: Some(Headers { inner: 0x7fdf60012810 }), error_response_body: Some("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>XAmzContentSHA256Mismatch</Code><Message>The provided &#39;x-amz-content-sha256&#39; header does not match what was computed.</Message><Key>minio.txt</Key><BucketName>test</BucketName><Resource>/test/minio.txt</Resource><RequestId>17B5D54057EF17CD</RequestId><HostId>b7190f0a-016d-4608-a046-8d37d0c6c743</HostId></Error>") }

Feb 21 17:01:22 yigou-dev-102-44 mount-s3[23613]: [WARN] release{req=130 ino=10 fh=10 name=minio.txt}: mountpoint_s3::fuse: release failed: put failed: Client error: Unknown response error: MetaRequestResult { response_status: 400, crt_error: Error(14343, "aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request"), error_response_headers: Some(Headers { inner: 0x7fdf60012810 }), error_response_body: Some("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>XAmzContentSHA256Mismatch</Code><Message>The provided &#39;x-amz-content-sha256&#39; header does not match what was computed.</Message><Key>minio.txt</Key><BucketName>test</BucketName><Resource>/test/minio.txt</Resource><RequestId>17B5D54057EF17CD</RequestId><HostId>b7190f0a-016d-4608-a046-8d37d0c6c743</HostId></Error>") }

Hey @lgy1027, I suspect the error you are seeing is because the S3-like implementation may not support the CRC32 checksums used by Mountpoint when uploading new files. Please check if your version of MinIO does support CRC32 checksums - I see that there was some discussion around supporting these before: https://github.com/minio/minio/discussions/15723.

Since your error is unrelated to the one in this issue, please open a new issue against the Mountpoint repository if you have more questions. Thanks!

barrowkwan commented 6 months ago

I also have same issue with the S3 CSI driver. My pod can mount the S3 drive and can ls all files. when I try to cat or cp file from that S3 mount point, I kept getting error cat: /www/404.html: Input/output error it is strange that I can delete file in the S3 bucket eg rm /www/404.html work fine. Try to read the file or create file in the S3 mount point is having issue.

barrowkwan commented 6 months ago

Turns out my issue is the S3 bucket that I create is used a self-managed KMS for encrytion and the SA used by the S3 CSI driver has no access to the KMS. after grant access to the SA, I can read/write to the bucket.