Resizing iSCSI volumes does not automatically expand filesystem (ext4)

phhutter commented 4 months ago

Describe the bug We are encountering an issue where, upon resizing iscsi volumes (ext4), only the volume is being resized, but not the fs itself. We have already tried restarting the application pod that mounts the iscsi volume and attempted to rebind it on a different node using node-selector. However, it seems that resize2fs is not being executed.

I have not been able to find any relevant log files in the trident ds pods. When I perform a manual resize2fs on the volume through the node, the filesystem is successfully resized to the desired size without any errors.

Could you please guide me on how to locate the appropriate logs, or let me know which component is responsible for the fs expansion?

Is my assumption correct, that resizing iscsi volumes necessarily requires a reboot of the application pod?

Environment

Trident version: 24.02.0
Container runtime: cri-o://1.27.7-4.rhaos4.14.gitceaac6e.el9
Kubernetes version: v1.27.14+7852426
Kubernetes orchestrator: OpenShift v4.14.30
OS: RedHat CoreOS 9.2 (5.14.0-284.69.1.el9_2.x86_64)
NetApp backend types: ONTAP AFF 9.12.1P12

To Reproduce Steps to reproduce the behavior:

create a dummy pod which mounts an iscsi pvc
resize pvc
verify size of pvc and the corresponding filesystem

Expected behavior Upon restart of an application pod which mounts an iscsi volume, a resize2fs is automatically executed.

wonderland commented 3 months ago

Not sure why this is not working for you. I tried the following in a lab, verbose notes below to make it easy to reproduce in your env.

A very simple PVC + Deployment. Container writes timestamps into a file just to simulate regular write access to the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: deploy-data
  labels:
    app: deploy-demo
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: storage-class-iscsi
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-demo
  labels:
    app: deploy-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deploy-demo
  template:
    metadata:
      labels:
        app: deploy-demo
    spec:
      nodeName: rhel1
      containers:
      - name: busybox
        image: registry.k8s.io/busybox
        command: [ "sh", "-c"]
        args:
        - while true; do
            echo -en '\n';
            echo "${NODE_NAME}" "`date +"%Y-%m-%d %H:%M:%S"`"   >> /data/log.txt;
            sleep 1;
          done;
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - mountPath: /data
          name: data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: deploy-data

When I apply this, PVC and pod are up and the pod reports the expected PVC size:

 k get pod,pvc
NAME                               READY   STATUS    RESTARTS   AGE
pod/deploy-demo-6bdcb5d994-mz9pg   1/1     Running   0          71s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/deploy-data   Bound    pvc-2edea93d-e19d-43b8-a27b-20969bf7ef7b   10Gi       RWO            storage-class-iscsi   <unset>                 71s

 k exec deploy/deploy-demo -- df -h /data
Filesystem                Size      Used Available Use% Mounted on
/dev/mapper/3600a0980774f6a34712b572d41767175        9.7G     28.0K      9.2G   0% /data

I then modify the PVC yaml to 15GB and apply it again. After a short moment, the PVC reflects the new size. Pod is not restarted:

 k get pod,pvc
NAME                               READY   STATUS    RESTARTS   AGE
pod/deploy-demo-6bdcb5d994-mz9pg   1/1     Running   0          2m12s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/deploy-data   Bound    pvc-2edea93d-e19d-43b8-a27b-20969bf7ef7b   15Gi       RWO            storage-class-iscsi   <unset>                 2m12s

The pod sees the new capacity as well (without being restarted):

 k exec deploy/deploy-demo -- df -h /data
Filesystem                Size      Used Available Use% Mounted on
/dev/mapper/3600a0980774f6a34712b572d41767175        14.7G     28.0K     13.9G   0% /data

What filesystem type are you using? My setup is on default ext4, not sure if that makes a difference?

 k exec deploy/deploy-demo -- mount | grep /data
/dev/mapper/3600a0980774f6a34712b572d41767175 on /data type ext4 (rw,seclabel,relatime,stripe=16)

jwebster7 commented 1 month ago

@phhutter thank you for reporting this.

If you are still encountering this, would you share debug logs from the Trident Deployment and DaemonSet Pods?

phhutter commented 1 month ago

Hey @jwebster7

I have tried to reproduce it several times in various scenarios with different kubernetes/trident versions, but i was not able to. we only saw the behavior once on a replicaset of kafka where resizing of the pvc did not work as expected. At this time, we saw the same issue on 3 different iscsi pvcs, which were all used by the same kafka isntance.

In the meantime we have fixed all the affected volumes by running resize2fs manually.

As long as no other customer experiences similar issues, I think we can archive the problem for now.

NetApp / trident

Resizing iSCSI volumes does not automatically expand filesystem (ext4) #911