ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

subvolume count greater than number of pvcs #1608

Closed Yuggupta27 closed 3 years ago

Yuggupta27 commented 3 years ago

Describe the bug

On creation of multiple clones of a PVC in parallel (37 clones to be exact), the subvolume count came to be 39

Environment details

Steps to reproduce

Steps to reproduce the behavior:

Create multiple parallel clones of a PVC

Actual results

[ygupta@localhost cephfs]$ kubectl get pvc
NAME                                                                                  STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
cephfs-pvc-clone1                                                                     Bound     pvc-29bd47e9-be33-49c3-b3a0-e32952d31c0a   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone1027926825724623522421321                                             Bound     pvc-b3645ea3-da17-40f1-b874-f71100c96005   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone11281027926825724623522421321                                         Bound     pvc-a3f08cc1-b7e3-4bd5-b8dd-42517b7e3a2f   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone122911281027926825724623522421321                                     Bound     pvc-ecf66e86-fb18-437e-8a62-5c32cabc37ac   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone1330122911281027926825724623522421321                                 Bound     pvc-b7ee6b49-262d-4430-97f7-e1519d5ed9c7   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone14311330122911281027926825724623522421321                             Bound     pvc-4eae324b-390d-4968-b5bd-6ffbb16ddda0   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone153214311330122911281027926825724623522421321                         Bound     pvc-9eec4052-4b7a-43d6-be22-06352a57d30c   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone1633153214311330122911281027926825724623522421321                     Bound     pvc-038fcd62-bf9e-4f2c-9b1c-795761bed90d   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone17341633153214311330122911281027926825724623522421321                 Bound     pvc-8b97b767-6a45-41b8-8f9a-e966b15de33f   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone183517341633153214311330122911281027926825724623522421321             Bound     pvc-a05549eb-82a7-4474-b477-d7f361a267b6   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone1936183517341633153214311330122911281027926825724623522421321         Bound     pvc-b0c1397f-57df-4001-a8e2-7460735130b6   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone21                                                                    Bound     pvc-ae7d80b1-caab-4117-b48e-10f40e2b537f   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone21321                                                                 Bound     pvc-07a97fbd-7455-4ff8-9055-fda28f186169   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone22421321                                                              Bound     pvc-6d1bf277-0f84-4e77-8e15-a75a46c46643   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone23522421321                                                           Bound     pvc-c271c855-cc30-4664-90ba-f6879cf7a2d2   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone24623522421321                                                        Bound     pvc-407c091e-b65e-46bb-a745-03c597b5b252   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone25724623522421321                                                     Bound     pvc-09d7da36-b898-459a-a8b9-49ab1444f5fb   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone26825724623522421321                                                  Bound     pvc-660fa825-44ef-445e-b98b-5ca0dda5d7e6   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone27926825724623522421321                                               Bound     pvc-813b6ed4-fff8-4eaf-85e6-3248108aa366   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone281027926825724623522421321                                           Bound     pvc-d171f293-66ed-40e3-8abc-e7281346dad9   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone2911281027926825724623522421321                                       Bound     pvc-60fbfee1-0942-4531-b5bf-89ee1805aacd   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone30122911281027926825724623522421321                                   Bound     pvc-44d21966-b6d1-4fd9-908c-1b27124895c6   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone311330122911281027926825724623522421321                               Bound     pvc-c02a21c6-3b1b-4579-b874-59367bb54ec6   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone3214311330122911281027926825724623522421321                           Bound     pvc-870db267-7ab3-4a2b-b2d0-845d3358ffce   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone33153214311330122911281027926825724623522421321                       Bound     pvc-b5ca4b40-b0eb-440c-8dd3-0a7b779c0275   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone341633153214311330122911281027926825724623522421321                   Bound     pvc-397a34b1-fd91-4d96-a060-a33f6a941222   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone3517341633153214311330122911281027926825724623522421321               Bound     pvc-6244ec28-c274-4d18-bd1d-520bdd06f454   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone36183517341633153214311330122911281027926825724623522421321           Bound     pvc-c7e02bba-8ef0-4a35-857b-c048559a49aa   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone371936183517341633153214311330122911281027926825724623522421321       Bound     pvc-b2748236-eacb-4435-be18-cf7344513723   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone38371936183517341633153214311330122911281027926825724623522421321     Bound     pvc-13da7777-1294-4075-aa6d-43f8726dc3ba   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone3938371936183517341633153214311330122911281027926825724623522421321   Bound     pvc-d09f397f-c9f6-491d-b6a2-61a9e4f6e743   1Gi        RWX            csi-cephfs-sc   34m
cephfs-pvc-clone421321                                                                Bound     pvc-e8ad6135-70c8-4bcd-bdc5-61ef64077e8a   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone522421321                                                             Bound     pvc-19e125df-bdf8-4c6d-a1dc-ca4ac6d6ad90   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone623522421321                                                          Bound     pvc-6ff0b5d9-8c3a-4a97-ad15-7ea90ffbb362   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone724623522421321                                                       Bound     pvc-80008f1f-0035-4bce-94fc-b682e168e3fd   1Gi        RWX            csi-cephfs-sc   35m
cephfs-pvc-clone825724623522421321                                                    Pending                                                                        csi-cephfs-sc   35m
cephfs-pvc-clone926825724623522421321                                                 Bound     pvc-e1378a98-e252-4bd8-8a4c-f2eaf7daf7db   1Gi        RWX            csi-cephfs-sc   35m
csi-cephfs-pvc                                                                        Bound     pvc-ee0319bb-09fb-41ff-a2ae-d28f124c8699   1Gi        RWX            csi-cephfs-sc   40m
sh-4.2# ceph fs subvolume ls myfs mysubgrp | grep name
        "name": "csi-vol-7673e3e4-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-972fe20c-128c-11eb-9622-0242ac11000b"
        "name": "csi-vol-7aa7905f-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-70ddf5f4-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-72efb408-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-c40d80bd-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-77d20ff7-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-71a97de2-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7fae78d1-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-75c01bb6-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-80cc6f30-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-78b2c466-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7888843f-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-661a8766-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7bcfd5c3-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6ce2a329-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-773090f3-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6b7003f4-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-77306902-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-75a7533d-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-76173d40-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6e1f991c-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-76534162-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-656f7209-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-641c8590-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7ba11350-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-63654809-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7f81e7c4-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7d260567-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-79c41bfb-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6a4ab9be-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-69eb9fcf-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-80e66d7d-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-7acca527-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-758d4e43-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-8ed111e3-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6d5fdcda-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-806c5918-128d-11eb-9622-0242ac11000b"
        "name": "csi-vol-6cb76fb8-128d-11eb-9622-0242ac11000b"

Expected behavior

The number of subvolumes should not exceed the PVCs created.

Logs

csi-provisioner https://termbin.com/8d5x

csi-cephfsplugin https://termbin.com/8m8oz

Additional context

The PVCs created were of 1GB size with 10MB data.

Also, Description of the only Pending PVC

Name:          cephfs-pvc-clone825724623522421321
Namespace:     default
StorageClass:  csi-cephfs-sc
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
DataSource:
  Kind:      PersistentVolumeClaim
  Name:      csi-cephfs-pvc
Mounted By:  <none>
Events:
  Type     Reason                Age                   From                                                                                                    Message
  ----     ------                ----                  ----                                                                                                    -------
  Warning  ProvisioningFailed    42m                   cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-v9dzx_7288d277-b6f7-4b24-a064-56f2831b23c1  failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  ProvisioningFailed    42m (x2 over 42m)     cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-v9dzx_7288d277-b6f7-4b24-a064-56f2831b23c1  failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-9630860f-246a-40f8-bb93-d330477f04fe already exists
  Warning  ProvisioningFailed    8m23s (x13 over 42m)  cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-v9dzx_7288d277-b6f7-4b24-a064-56f2831b23c1  failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph args: [fs subvolume snapshot info myfs csi-vol-972fe20c-128c-11eb-9622-0242ac11000b csi-vol-75c01bb6-128d-11eb-9622-0242ac11000b --group_name mysubgrp -m 10.96.43.189:6789 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** --format=json]
  Normal   Provisioning          3m23s (x17 over 45m)  cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-v9dzx_7288d277-b6f7-4b24-a064-56f2831b23c1  External provisioner is provisioning volume for claim "default/cephfs-pvc-clone825724623522421321"
  Normal   ExternalProvisioning  15s (x182 over 45m)   persistentvolume-controller                                                                             waiting for a volume to be created, either by external provisioner "cephfs.csi.ceph.com" or manually created by system administrator
Madhu-1 commented 3 years ago

@Yuggupta27 looks like you have attached csi-cephfsplugin pod log, please attach the csi-cephfsplugin container logs of the provisioner pod.

Yuggupta27 commented 3 years ago

On a second attempt to reproduce the issue...

  1. Created 34 PVCs parallelly
    [ygupta@localhost cephfs]$ kubectl get pvc
    NAME                                                                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
    cephfs-pvc-clone1                                                                       Bound    pvc-61f267a1-a636-404c-8d94-84ba5363142d   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone102892787266255242342232211                                             Bound    pvc-f447148d-801a-4c2f-b4b5-102c57a3af09   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone1129102892787266255242342232211                                         Bound    pvc-b96c84a6-8308-4f7e-ad9c-15854566c8d2   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone12301129102892787266255242342232211                                     Bound    pvc-87a8c594-2c20-4efd-9c39-6e21617dad87   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone13323112301129102892787266255242342232211                               Bound    pvc-c0838a42-9740-4b44-af01-998202e2f80f   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone1413323112301129102892787266255242342232211                             Bound    pvc-f385f8e4-9b13-46b5-976d-d0c0a5a485ee   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone15331413323112301129102892787266255242342232211                         Bound    pvc-40a97e8c-8c3d-4d9b-a557-3106b60c9243   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone163415331413323112301129102892787266255242342232211                     Bound    pvc-97df7303-7e9c-45f9-ac22-a43e24222f6c   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone1735163415331413323112301129102892787266255242342232211                 Bound    pvc-757e7511-1f97-48a6-8aa7-43327373979a   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone18361735163415331413323112301129102892787266255242342232211             Bound    pvc-a0f287ec-6452-4656-8978-e98f1ad789db   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone193718361735163415331413323112301129102892787266255242342232211         Bound    pvc-c167dc59-87ff-4d4c-b3cc-c681ec893c48   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone2211                                                                    Bound    pvc-ef83be74-1dd6-4940-a762-509b3d9b77b0   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone2232211                                                                 Bound    pvc-db86ba32-15d9-419a-a353-206291d1b78f   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone2342232211                                                              Bound    pvc-4976d4fc-690f-41ef-b319-c295e95b58b6   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone242342232211                                                            Bound    pvc-c836b27d-be65-4639-9d18-62e1bc11073c   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone2787266255242342232211                                                  Bound    pvc-d5e6824a-54f0-4c23-a6af-7a81fe21441d   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone2892787266255242342232211                                               Bound    pvc-2382f187-c4f8-4626-84a5-3f4d068b59c9   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone29102892787266255242342232211                                           Bound    pvc-71219ad9-1d8d-4582-81f6-a9a9460f7bd6   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone301129102892787266255242342232211                                       Bound    pvc-c04b2f88-f421-4381-8338-984bb9568a94   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone3112301129102892787266255242342232211                                   Bound    pvc-8721881d-1204-4254-a602-1e8fe760004b   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone331413323112301129102892787266255242342232211                           Bound    pvc-76f586bb-d7fc-4177-bdb7-74bf122ba689   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone3415331413323112301129102892787266255242342232211                       Bound    pvc-515994b0-a23b-4ad4-bf7b-42e0562b3efe   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone35163415331413323112301129102892787266255242342232211                   Bound    pvc-92f4dbd0-eb36-4741-ae65-82ab1b448a18   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone361735163415331413323112301129102892787266255242342232211               Bound    pvc-28190f41-33d5-4082-9932-8c7295dfe87b   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone3718361735163415331413323112301129102892787266255242342232211           Bound    pvc-57b33bfc-cd2d-4998-84ec-2ea72a59621c   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone38193718361735163415331413323112301129102892787266255242342232211       Bound    pvc-e7266d6f-d533-4453-a2f4-42e995118def   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone3938193718361735163415331413323112301129102892787266255242342232211     Bound    pvc-77f11e05-fddc-480b-b93b-84ed0a457f1c   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone403938193718361735163415331413323112301129102892787266255242342232211   Bound    pvc-443229dd-7ec7-44bf-ba20-196b6bc44eaa   1Gi        RWX            csi-cephfs-sc   15m
    cephfs-pvc-clone42232211                                                                Bound    pvc-8ab86d5f-c019-427c-a21b-09f013bc2abf   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone5242342232211                                                           Bound    pvc-f1f61bc3-5ce7-4073-8531-3b0c9780684a   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone6255242342232211                                                        Bound    pvc-26da4827-c853-4330-9959-73585e941602   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone7266255242342232211                                                     Bound    pvc-0fa87714-280a-46df-b8b0-24aa20b8ea6d   1Gi        RWX            csi-cephfs-sc   16m
    cephfs-pvc-clone92787266255242342232211                                                 Bound    pvc-b02780f2-8201-4276-af00-3493f3ad8a42   1Gi        RWX            csi-cephfs-sc   16m
    csi-cephfs-pvc   
    
    [ygupta@localhost cephfs]$ kubectl get pvc | grep csi-cephfs-sc | wc -l
    34
2.  The  subvolume count this time:

sh-4.2# ceph fs subvolume ls myfs mysubgrp | grep name
"name": "csi-vol-1998c5fb-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-195bf987-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-166537c7-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-134d7114-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-12294101-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-143156ef-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-14503b77-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-168e569b-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-14004609-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1ae931fb-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-166e6a9a-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-057df06d-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-17c3350d-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-01478843-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1876c60b-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-0f16a9b4-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1a089691-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-167eb46c-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1408d7fc-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e" "name": "csi-vol-169e4d1a-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-fd76e71a-12ba-11eb-a42a-0242ac11000e" "name": "csi-vol-16e42c18-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-18a78fe3-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-12e0f93a-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-0d311bf9-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1199e328-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-18185a02-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-0803a2cd-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-1949adf6-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-14f71aed-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-0db559c0-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-092c63da-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-15e43f06-12bd-11eb-a42a-0242ac11000e" "name": "csi-vol-0ab05bae-12bd-11eb-a42a-0242ac11000e"

sh-4.2# ceph fs subvolume ls myfs mysubgrp | grep name | wc -l 35



Logs:

1. [csi-provisioner](https://termbin.com/6ru0)

1. [csi-cephfsplugin](https://termbin.com/nmpy)
ShyamsundarR commented 3 years ago

The clone from volume is retried (as the first call takes time to complete), and the call to ensure intermediate snapshot is deleted is failing with EINVAL (22) rather than the expected string failure, causing CreateVolume (which has actually succeeded, to never return a success.

So in this case, the clone is found correctly using the OMap journals, but attempts to ensure the intermediate snapshot to create the clone is cleaned up is failing in cleanupCloneFromSubvolumeSnapshot -> getSnapshotInfo.

This could be an artifact of the ceph version in play, in terms of the error returned (can be checked with a command using the toolbox).

The exact log line where the failure starts occurring is,

E1020 08:44:02.998228       1 snapshot.go:139] ID: 166 Req-ID: pvc-c0b06217-9d54-4829-b223-6b04dde13d5f failed to get subvolume snapshot info csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e(an error (exit status 22) occurred while running ceph args: [fs subvolume snapshot info myfs csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e --group_name mysubgrp -m 10.96.131.184:6789 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** --format=json]) in fs myfs
ShyamsundarR commented 3 years ago

One interesting fact is we do not see a PVC that is in pending state for this create, as that would never have completed. Unsure why though.

Yuggupta27 commented 3 years ago

The clone from volume is retried (as the first call takes time to complete), and the call to ensure intermediate snapshot is deleted is failing with EINVAL (22) rather than the expected string failure, causing CreateVolume (which has actually succeeded, to never return a success.

So in this case, the clone is found correctly using the OMap journals, but attempts to ensure the intermediate snapshot to create the clone is cleaned up is failing in cleanupCloneFromSubvolumeSnapshot -> getSnapshotInfo.

This could be an artifact of the ceph version in play, in terms of the error returned (can be checked with a command using the toolbox).

The exact log line where the failure starts occurring is,

E1020 08:44:02.998228       1 snapshot.go:139] ID: 166 Req-ID: pvc-c0b06217-9d54-4829-b223-6b04dde13d5f failed to get subvolume snapshot info csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e(an error (exit status 22) occurred while running ceph args: [fs subvolume snapshot info myfs csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e --group_name mysubgrp -m 10.96.131.184:6789 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** --format=json]) in fs myfs

Thanks for looking into it @ShyamsundarR , The ceph version I used was v 14.2.8 which does not have the snapshot info command. It might be failing due to that. @humblec , do we have a version check when we run the check for snapshot info?

Yuggupta27 commented 3 years ago

Also, I will try to reproduce the issue for ceph-master again and verify if the issue persists.

Yuggupta27 commented 3 years ago

The clone from volume is retried (as the first call takes time to complete), and the call to ensure intermediate snapshot is deleted is failing with EINVAL (22) rather than the expected string failure, causing CreateVolume (which has actually succeeded, to never return a success.

So in this case, the clone is found correctly using the OMap journals, but attempts to ensure the intermediate snapshot to create the clone is cleaned up is failing in cleanupCloneFromSubvolumeSnapshot -> getSnapshotInfo.

This could be an artifact of the ceph version in play, in terms of the error returned (can be checked with a command using the toolbox).

The exact log line where the failure starts occurring is,

E1020 08:44:02.998228       1 snapshot.go:139] ID: 166 Req-ID: pvc-c0b06217-9d54-4829-b223-6b04dde13d5f failed to get subvolume snapshot info csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e(an error (exit status 22) occurred while running ceph args: [fs subvolume snapshot info myfs csi-vol-9b76f79e-12ae-11eb-a42a-0242ac11000e csi-vol-1a360ecf-12b0-11eb-a42a-0242ac11000e --group_name mysubgrp -m 10.96.131.184:6789 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** --format=json]) in fs myfs

Tested another scenario where this issue is arising,

  1. Created a PVC of size 5 GB with 3 GB data in it.
  2. On creation of clone, it stays Pending which on description gives
    [ygupta@localhost cephfs]$ kubectl get pvc
    NAME               STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
    cephfs-pvc-clone   Pending                                                                        csi-cephfs-sc   37m
    csi-cephfs-pvc     Bound     pvc-2ecc5d49-2f89-412e-ad3a-a0e11480b85f   5Gi        RWX            csi-cephfs-sc   42m
    [ygupta@localhost cephfs]$ kubectl describe pvc cephfs-pvc-clone
    Name:          cephfs-pvc-clone
    Namespace:     default
    StorageClass:  csi-cephfs-sc
    Status:        Pending
    Volume:        
    Labels:        <none>
    Annotations:   volume.beta.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com
    Finalizers:    [kubernetes.io/pvc-protection]
    Capacity:      
    Access Modes:  
    VolumeMode:    Filesystem
    DataSource:
    Kind:      PersistentVolumeClaim
    Name:      csi-cephfs-pvc
    Mounted By:  <none>
    Events:
    Type     Reason                Age                   From                                                                                                    Message
    ----     ------                ----                  ----                                                                                                    -------
    Warning  ProvisioningFailed    37m (x5 over 37m)     cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-qhrbj_e80aaf94-1270-4b06-a22d-2c34b80fb4a7  failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Aborted desc = in progress
    Warning  ProvisioningFailed    8m49s (x10 over 37m)  cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-qhrbj_e80aaf94-1270-4b06-a22d-2c34b80fb4a7  failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph args: [fs subvolume snapshot info myfs csi-vol-8aaec6cf-1369-11eb-a42a-0242ac11000e csi-vol-2fc0fdb4-136a-11eb-a42a-0242ac11000e --group_name mysubgrp -m 10.96.131.184:6789 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** --format=json]
    Normal   Provisioning          3m49s (x16 over 38m)  cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bfd557f7f-qhrbj_e80aaf94-1270-4b06-a22d-2c34b80fb4a7  External provisioner is provisioning volume for claim "default/cephfs-pvc-clone"
    Normal   ExternalProvisioning  3m (x142 over 38m)    persistentvolume-controller                                                                             waiting for a volume to be created, either by external provisioner "cephfs.csi.ceph.com" or manually created by system administrator
    [ygupta@localhost cephfs]$ 

    It might be again as the snapshot info command is called even though it is not supported in Ceph v14.2.8

cc @ShyamsundarR @Madhu-1

Madhu-1 commented 3 years ago

@Yuggupta27 Thanks for looking into it. this looks to be the issue. please send a patch to fix it. if snapshotinfo command is not present to try to unprotect the snapshot and make some decisions based on the error message.

Yuggupta27 commented 3 years ago

@Yuggupta27 Thanks for looking into it. this looks to be the issue. please send a patch to fix it. if snapshotinfo command is not present to try to unprotect the snapshot and make some decisions based on the error message.

Sure @Madhu-1 :+1:

Madhu-1 commented 3 years ago

@Yuggupta27 looking at code the above issue exists but looks like it wont leak the omap data. please check and try to reproduce it again to see where else the omap data is getting leaked

Yuggupta27 commented 3 years ago

This issue is now fixed by the following PRs: https://github.com/ceph/ceph-csi/pull/1674 , https://github.com/ceph/ceph-csi/pull/1671 and https://github.com/ceph/ceph-csi/pull/1660 . As the testing indicates that this issue doesn't exist anymore, closing this for now.