alibaba / open-local

cloud-native local storage management system for stateful workload, low-latency with simplicity
Apache License 2.0
469 stars 81 forks source link

Delete Retain-PV(status: Released) not delete lv #261

Closed Clara12062 closed 2 months ago

Clara12062 commented 4 months ago

Question

Such as https://github.com/kubernetes-csi/external-provisioner/tree/master#, remove PVReclaimPolicy:Retain pv, ControllerDeleteVolume is not called. If I delete a pv whose status is Released and PVReclaimPolicy is Retain, the pv can be deleted successfully. But the LV associated with PV will not. And checking the log of extender-scheduler, it is found that the corresponding VG. But has also been removed from the cache, in fact, the logical volume on the host has not been released.

Should we check the difference between the status of nodecache and nls when the agent goes to update nls, and remove lv without related pv? Or actually delete the lv in the onPVDelete event.

peter-wangxu commented 4 months ago

retain policy should remain the storage on volumegroup as it is according to https://kubernetes.io/docs/concepts/storage/persistent-volumes/#retain

the problem here is, as the logical volume was not removed from host volume group, the later pv creation might failed due to the difference between nodecache and vg.

solution is to keep the nodecache for retained pv or did not report available capacity on the agent side.

Clara12062 commented 4 months ago

yep. I think it is necessary to ensure that the pv usage information of nodecache is consistent with the available capacity of agent side.

Please ask if it is possible to do this: When the agent side updates the nls status, it will compare with the localPVs in the nodecache. If the pv in localPVs is removed, it will delete the lv that is no longer used on the corresponding node when updating the nls, and release the volume group resource.

peter-wangxu commented 4 months ago

OK,it's possible to add gc logic in the agent.

but what if the user really want to keep the local volume?

peter-wangxu commented 4 months ago

if you don't want the logic volume (local-xxxx), why not change the policy to delete before delete pvc/pv?

Clara12062 commented 4 months ago

why not change the policy to delete before delete pvc/pv?

That's true.

However, there are certain circumstances in which you may want to use Retain to keep your data secure. After some time, I decided to delete the PV after confirming that the data was no longer needed, but the space in the volume group on the actual host was not freed. However, when scheduler queries nodecache, it finds that the capacity is satisfied, which is not the case in fact.

solution is to keep the nodecache for retained pv or did not report available capacity on the agent side.

As you said: either change the nodecache RemoveLV or change the agent side.