awslabs / mountpoint-s3-csi-driver

Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an Amazon S3 bucket as a storage volume accessible by containers in your Kubernetes cluster.
Apache License 2.0
213 stars 26 forks source link

Simplify caching configuration #141

Open jjkr opened 9 months ago

jjkr commented 9 months ago

/feature

Is your feature request related to a problem? Please describe. Caching is supported today by adding a cache option to a persistent volume configuration and passing in a directory on the node's filesystem. This works, but comes with a couple sharp edges. Creating the directory on the node is not done automatically, so it has to be created manually ahead of time.

Describe the solution you'd like in detail Caching configuration should be possible without manually making changes to the nodes and should make it easy to define different types of storage to use as cache like a ramdisk.

Describe alternatives you've considered One potential solution is to reference other persistent volumes or mounts as cache, which could make for nice composability of the k8s constructs.

Additional context Mountpoint's documentation on caching: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#caching-configuration

ggkr commented 8 months ago

We have the same issue and attempted to workaround it by using an init container to create the cache directory on the node like in the following example: (I didn't provide the pv config in this example, but it was configured to cache dir on /tmp/s3-cache)

apiVersion: v1
kind: Pod
metadata:
  name: s3-app
spec:
  initContainers:
  - name: create-cache-dir
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "mkdir -p /cache-dir/s3-cache; echo 'hi' > /cache-dir/s3-cache/test.txt"]
    volumeMounts:
    - name: cache-location
      mountPath: /cache-dir
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "ls -lR /data; sleep 99"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: s3-pvc
  - name: cache-location
    hostPath:
      path: /tmp/

This example DOES NOT work - as k8s attempts to mount the s3 volume even before the init container.

terrytsay commented 6 months ago

Based on the example here: https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/caching.yaml

I worked around this issue by using a hostPath mount to create the directory (if not exist) on host.

Regardless of the order of volumeMount or volumes, it will automatically retry until it is mounted. But I put it before the pvc mount in case it does this in the order specified. From my testing, pod comes up immediately.

apiVersion: v1
kind: Pod
metadata:
  name: s3-app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "echo 'Hello from the container!' >> /data/$(date -u).txt; tail -f /dev/null"]
      volumeMounts:
        - name: cache-location
          mountPath: /tmp/pv
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: cache-location
      hostPath:
        path: /tmp/s3-pv1-cache
        type: DirectoryOrCreate
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: s3-claim
gyf304 commented 6 months ago

I'm working around this using a k8s job.

apiVersion: batch/v1
kind: Job
metadata:
  name: s3-cache-create
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: busybox
        image: busybox
        command:
        - mkdir
        - "-p"
        - /host/var/tmp/s3-cache
        volumeMounts:
        - name: host-var-tmp
          mountPath: /host/var/tmp
      volumes:
      - name: host-var-tmp
        hostPath:
          path: /var/tmp
      restartPolicy: Never

A job per volume is needed - and you should modify the path so that it is unique per volume.

mcandeia commented 4 months ago

This worked for me

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: s3-cache-dir-setup
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: s3-cache-dir-setup
  template:
    metadata:
      labels:
        app: s3-cache-dir-setup
    spec:
      initContainers:
        - name: create-s3-cache-dir
          image: busybox
          command:
            - sh
            - -c
            - |
              mkdir -p /tmp/s3-local-cache && \
              chmod 0700 /tmp/s3-local-cache
          securityContext:
            privileged: true
          volumeMounts:
            - name: host-mount
              mountPath: /tmp/s3-local-cache
      containers:
        - name: pause
          image: k8s.gcr.io/pause:3.1
      volumes:
        - name: host-mount
          hostPath:
            path: /tmp/s3-local-cache
varkey commented 3 months ago

From the documentation

The cache directory is not reusable by other Mountpoint processes and will be cleaned at mount time and exit. When running multiple Mountpoint processes concurrently on the same host, you should use unique cache directories to avoid different processes interfering with the others' cache content.

If this is the case, we'd need unique cache directory per pod, say there is more than one pod of the same deployment scheduled on the same node. Looks like none of the workarounds suggested above supports this scenario.

wSedlacek commented 1 month ago

I worked through different ways of creating a host path for the project I work on. I came across a few interesting constraints that are worth sharing.

Using a provisioner for the nodes such as setting up scripts with the Karpenter EC2 Class only works if the paths used are known ahead of time. If you are scaling your buckets up and down independent of the lifetime of the node then it is impossible to create all the directories ahead of time like this.

Using a DaemonSet to mount the hostPath on every Node does work, but you quickly hit pod limits per node if you have a scaling number of buckets you mount. For example we create review environments for every PR which have their own bucket so you can imagine as the PRs grow the number of DaemonSets grows with them leaving less and less room for other pods on the nodes.

Using hostPath for volumes on the workloads works, but if you are using Knative or a similar wrapper for your workloads hostPath might not be exposed. To get around this (which is ultimately the solution I used) I create PersistentVolume configured specifically for hostPath and then create a PersistentVolumeClaim to map to my workloads.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: default-bucket-cache
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteMany
  hostPath:
    path: /tmp/cache-default-bucket
    type: DirectoryOrCreate
  capacity:
    storage: 500Mi
  claimRef:
    namespace: default
    name: bucket-cache
    apiVersion: v1
    kind: PersistentVolumeClaim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: default
  name: bucket-cache
spec:
  storageClassName: manual
  resources:
    requests:
      storage: 500Mi
  volumeName: bucket-cache
  accessModes:
    - ReadWriteMany

It would be nice if there was a way to simply specify a pv for the mountpoint to use. I am thinking specifically for putting the cache somewhere else other than the host that way it could be reused across multiple nodes.

tvandinther commented 18 hours ago

To add on to these points I would have liked for the driver to expose k8s specific caching configuration which is interpreted by the driver prior to starting a mountpoint process. This would include creating a cache directory on the node specifically for the mount being prepared, and at the path given in the configuration.

Even more ideal would be the ability to use an EBS (or other) backed volume as the cache so that normal node operations aren't able to be compromised via low disk space, but this poses some implementation questions. Perhaps #279 can offer a solution to this to run mountpoint in a sidecar.