LINBIT / linstor-server

High Performance Software-Defined Block Storage for container, cloud and virtualisation. Fully integrated with Docker, Kubernetes, Openstack, Proxmox etc.
https://docs.linbit.com/docs/linstor-guide/
GNU General Public License v3.0
984 stars 76 forks source link

Resource can be removed while nodes are not accesible #330

Open kvaps opened 1 year ago

kvaps commented 1 year ago

This issue occurs only with linstor-csi, but I beleeve that it is bug of linstor-controller. So I report it here /cc @WanzenBug

linstor controller 1.20.0; GIT-hash: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8

Steps to reproduce:

  1. I tested this on clean Kubernetes cluster where wasn't any PVC existing yet

  2. Create PVC:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
     name: my-pvc3
    spec:
     accessModes:
       - ReadWriteOnce
     storageClassName: linstor-thindata-r2
     resources:
       requests:
         storage: 10Gi
  3. disable piraeus-operator:

    kubectl scale --replicas 0 -n d8-linstor deploy/piraeus-operator
  4. remove linstor satellite daemonset:

    kubectl delete -n d8-linstor ds/linstor-node
  5. Check resources, they should become to Unknown state:

    linstor r l -r pvc-7b5728b7-d9f7-4d4e-84de-2e386c18b07f
    Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
    ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    ┊ ResourceName                             ┊ Node       ┊ Port ┊ Usage ┊ Conns ┊   State ┊ CreatedOn           ┊
    ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
    ┊ pvc-7b5728b7-d9f7-4d4e-84de-2e386c18b07f ┊ hf-virt-01 ┊ 7003 ┊       ┊       ┊ Unknown ┊ 2022-12-09 10:36:13 ┊
    ┊ pvc-7b5728b7-d9f7-4d4e-84de-2e386c18b07f ┊ hf-virt-02 ┊ 7003 ┊       ┊       ┊ Unknown ┊ 2022-12-09 10:36:13 ┊
    ┊ pvc-7b5728b7-d9f7-4d4e-84de-2e386c18b07f ┊ hf-virt-03 ┊ 7003 ┊       ┊       ┊ Unknown ┊ 2022-12-09 10:36:11 ┊
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  6. delete pvc:

    kubectl delete pvc my-pvc3

    Job is done, both resource-definition and resource considered as successfully deleted, but they remain on the node. The next restart of linstor-satellite will not see this device, will not stop drbd device and don't remove backing LV

This will make conflict of TCP-port in case you'd try to create a new PVC

WanzenBug commented 1 year ago

Interesting. I just tried to recreate the situation with bare LINSTOR and it works:

linstor rd c res1
linstor vd c res1
linstor vd c res1 1g
linstor rd ap res1
linstor rd c res2
linstor vd c res2 1g
linstor rd ap res2
# systemctl stop linstor-satellite <- on all nodes

# This deletes the resources:
linstor r d node-1 res1
linstor r d node-2 res1
linstor r d node-3 res1
linstor rd d res1

# This writes the message about waiting for the satellite to become online
linstor rd d res2

# systemctl start linstor-satellite <- on all nodes

Afterwards res1 is still configured, but does not show up in LINSTOR. res2 gets deleted on start up as expected.

Note that for kubernetes we use the method as shown by res1, because we might have a snapshot on the RD that we want to leave alone. So a delete is always a delete all resources + check if the RD does have snapshots.

kvaps commented 1 year ago

I tried to reproduce using pure linstor CLI, but didn't succeed with that. I was able to reproduce only with linstor-csi

WanzenBug commented 1 year ago

Sorry, wasn't very clear above: It "works" in the sense that I could reproduce it with res1