LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Problem while creating large (100+ GB) volume #371

Open duckhawk opened 1 year ago

duckhawk commented 1 year ago

Just trying to create volume 100+ GB

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: largepvc 
  namespace:  default
spec:
  storageClassName: "linstor-store-r2"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  strategy:
    type: Recreate
  replicas: 1
  selector:
    matchLabels:
      component: nginx
  template:
    metadata:
      labels:
        component: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        command: ["/usr/sbin/nginx"]
        args:
        - -g
        - daemon off;
        volumeMounts:
        - mountPath: "/app/media"
          name: largepvc 
        ports:
          - containerPort: 80
            protocol: TCP
      volumes:
        - name: largepvc
          persistentVolumeClaim:
            claimName: largepvc

Linstor creates volume, but it looks like mkfs failed because of timeout

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               18m                  linstor                  Successfully assigned default/nginx-c49998c79-lvnlx to node0
  Normal   SuccessfulAttachVolume  18m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0"
  Warning  FailedMount             101s (x16 over 18m)  kubelet                  MountVolume.SetUp failed for volume "pvc-c5dda542-a0ac-4336-882a-1724f98664b0" : rpc error: code = Internal desc = NodePublishVolume failed for pvc-c5dda542-a0ac-4336-882a-1724f98664b0: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o _netdev /dev/drbd1020 /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount
Output: mount: /var/lib/kubelet/pods/59315b6d-3f90-4bad-b831-4af54963d3cb/volumes/kubernetes.io~csi/pvc-c5dda542-a0ac-4336-882a-1724f98664b0/mount: wrong fs type, bad option, bad superblock on /dev/drbd1020, missing codepage or helper program, or other error.
  Warning  FailedMount  23s (x8 over 16m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[largepvc], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition

Also, resource is stucked in InUse state in one of diskfull replicas

root@master1:~# linstor r l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
+---------------------------------------------------------------------------------------------------------------+
| ResourceName                             | Node    | Port | Usage  | Conns |      State | CreatedOn           |
|===============================================================================================================|
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node0   | 7017 | InUse  | Ok    |   UpToDate | 2023-10-07 18:03:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | node4   | 7017 | Unused | Ok    |   UpToDate | 2023-10-07 18:04:57 |
| pvc-c5dda542-a0ac-4336-882a-1724f98664b0 | system1 | 7017 | Unused | Ok    | TieBreaker | 2023-10-07 18:04:52 |
+---------------------------------------------------------------------------------------------------------------+

This case can be fixed by creating fs and drbdadm up/down on node with resource in InUse status

root@master1:~# kubectl -n d8-linstor exec -ti linstor-node-8rmmv bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "linstor-satellite" out of: linstor-satellite, kube-rbac-proxy, drbd-prometheus-exporter
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node    ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊   1.28 GiB ┊ InUse  ┊   UpToDate ┊
┊ node4   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊            ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/# mkfs.ext4 -E lazy_itable_init=1 -E lazy_journal_init=1 /dev/drbd1020
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 52428800 4k blocks and 13132800 inodes
Filesystem UUID: 8e8784d7-2b96-4d50-92ac-1a9ad8074637
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

root@node0:
root@node0:/# drbdadm down pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# drbdadm up pvc-c5dda542-a0ac-4336-882a-1724f98664b0
root@node0:/# linstor v l -r pvc-c5dda542-a0ac-4336-882a-1724f98664b0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node    ┊ Resource                                 ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node0   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ node4   ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ store                ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊ 286.78 MiB ┊ Unused ┊   UpToDate ┊
┊ system1 ┊ pvc-c5dda542-a0ac-4336-882a-1724f98664b0 ┊ DfltDisklessStorPool ┊     0 ┊    1020 ┊ /dev/drbd1020 ┊            ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@node0:/# 
exit
root@master1:~# kubectl get pods
NAME                    READY   STATUS              RESTARTS   AGE
nginx-c49998c79-lvnlx   0/1     ContainerCreating   0          28m
root@master1:~# kubectl delete pod nginx-c49998c79-lvnlx 
pod "nginx-c49998c79-lvnlx" deleted
root@master1:~# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-c49998c79-rpp82   1/1     Running   0          4m37s
root@master1:~# kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
largepvc   Bound    pvc-c5dda542-a0ac-4336-882a-1724f98664b0   200Gi      RWO            linstor-store-r2   33m

I tried to use lazy params in storageclass, but it wasn't usefull

root@master1:~# kubectl get sc linstor-store-r2 -oyaml | grep fsOpts
  linstor.csi.linbit.com/fsOpts: -E lazy_itable_init=1 -E lazy_journal_init=1

Looks like there some timeout while volume provisioning