ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.28k stars 547 forks source link

Add a script to log the leftover omap and subvolume/rbd image #1248

Open Madhu-1 opened 4 years ago

Madhu-1 commented 4 years ago

When a PVC with reclaimPolicy:retain is deleted from the kubernetes cluster. the PV, OMAP data corresponding to the PVC and also the backend subvolume or rbd image need to be deleted to reclaim the space in the ceph cluster.

We need to add a script to list the all PVC, subvolume, rbd images (not we need to list the rbd images and subvolumes need to be cleanup) Note: one Gotha here is that maybe rbd image or subvolume might be used by other remote kubernetes clusters.

Scenarios we need to document and the script is

PVC with reclaimPolicy:retain

When PV is not deleted

We can fetch the details from the PV and print out the omap and rbd image/subvolume name

When PV is already deleted

This is kind of tricky as from outside we can list all the unused rbd images /subvolume and omap keys ( its admin responsibility to cleck that resource is not used anywhere else and delete it)

snapshot with reclaimPolicy:retain

When volume snapshot content is not deleted

We can fetch the details from the PV and print out the omap and rbd image/subvolume name

When volume snapshot content is already deleted

This is kind of tricky as from outside we can list all the unused rbd images /subvolume and omap keys ( its user responsibility to cleck that resource is not used anywhere else and delete it)

we already have a script to list the resources based on the PVC we need to enhance it for below things

we may also need to document/ or provide one more option to run ceph command when users are not running ceph cluster using rook(current scripts uses toolbox pod)

@humblec @ShyamsundarR @nixpanic please add if am missing any scenario.

Madhu-1 commented 4 years ago

@agarwal-mudit / @Yuggupta27 @chenxu1990 anyone interested in working on this one?

agarwal-mudit commented 4 years ago

I can work on it.

Madhu-1 commented 4 years ago

Thanks a lot @agarwal-mudit

Madhu-1 commented 4 years ago

https://github.com/ceph/ceph-csi/blob/master/docs/resource-cleanup.md will help to write the script

agarwal-mudit commented 4 years ago

https://github.com/ceph/ceph-csi/blob/master/docs/resource-cleanup.md will help to write the script

Thanks @Madhu-1

agarwal-mudit commented 4 years ago

A couple of questions, metntioning here for records:

  1. Should we extend tracevol.py or write a new script?
  2. Should the script be interactive (to get confirmation of the user before deleting anything)
  3. How are we supposed to provide this script to the downstream user?
Madhu-1 commented 4 years ago

A couple of questions, metntioning here for records:

  1. Should we extend tracevol.py or write a new script?
  2. Should the script be interactive (to get confirmation of the user before deleting anything)
  3. How are we supposed to provide this script to the downstream user?

If possible we can extend the script, we can make it interactive ask for deleting for confirmation, not worried for downstream as of now, we can make it part of cephcsi image, and use it?

Madhu-1 commented 4 years ago

@agarwal-mudit can this be done in 3.1.0? or do we need to move it out to the next release?

agarwal-mudit commented 4 years ago

Haven't started yet, won't be able to start until next week.

humblec commented 4 years ago

Moving this out of 3.1.0 release.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Madhu-1 commented 3 years ago

We need to work on this one. some idea related to cephfs is being captured at https://hackmd.io/LtSsWzO3TDea4c9rJtVFzQ

Madhu-1 commented 3 years ago

New doc where i have captured the steps for the script https://hackmd.io/733ztbXjQeO09a7iSIP1Gg @Rakshith-R see this helps

Rakshith-R commented 3 years ago

New doc where i have captured the steps for the script https://hackmd.io/733ztbXjQeO09a7iSIP1Gg @Rakshith-R see this helps

We also need to add details about RBD temp clones we create during snapshot, restore and cloning and about cloning snapshots on parent cephfs subvolumes. I'll add some details about these and cmds to be executed to the same doc.

Update: just add instruction to delete <image-name+'-temp'> if present. (created in case of rbd pvc-pvc clone)

Rakshith-R commented 3 years ago

This can be closed in favour of #2597.

I'll not be working on this feature request, will be busy with csiaddons reclaimSpace project and metro DR.