Open Champ-Goblem opened 1 month ago
I have temporarily patched this problem internally so we can test if the change helps with our OSD disk usage. I have generated a patch for the changes I made and I will upload it here for reference, in case it helps with the final fix.
From f3262f9fd84237c8617cd78f2a8792b871bc9ff2 Mon Sep 17 00:00:00 2001
From: Champ-Goblem <cameron@northflank.com>
Date: Wed, 14 Aug 2024 12:13:10 +0100
Subject: [PATCH] csi-snapshot: fetch data-pool info from rbd_get_data_pool_id
api call
---
internal/rbd/controllerserver.go | 2 ++
internal/rbd/rbd_util.go | 23 +++++++++++++++++++++++
vendor/github.com/ceph/go-ceph/rbd/rbd.go | 8 ++++++++
3 files changed, 33 insertions(+)
diff --git a/internal/rbd/controllerserver.go b/internal/rbd/controllerserver.go
index 4f07ce5f0..8e7d7b301 100644
--- a/internal/rbd/controllerserver.go
+++ b/internal/rbd/controllerserver.go
@@ -1328,6 +1328,8 @@ func (cs *ControllerServer) doSnapshotClone(
f := []string{librbd.FeatureNameLayering, librbd.FeatureNameDeepFlatten}
cloneRbd.ImageFeatureSet = librbd.FeatureSetFromNames(f)
+ cloneRbd.DataPool = parentVol.DataPool
+
err := cloneRbd.Connect(cr)
if err != nil {
return cloneRbd, err
diff --git a/internal/rbd/rbd_util.go b/internal/rbd/rbd_util.go
index 31b330b4d..ad3e07ce1 100644
--- a/internal/rbd/rbd_util.go
+++ b/internal/rbd/rbd_util.go
@@ -561,6 +561,17 @@ func (ri *rbdImage) isInUse() (bool, error) {
return len(watchers) > defaultWatchers, nil
}
+func (ri *rbdImage) GetDataPoolId() (int64, error) {
+ image, err := ri.open()
+ if err != nil {
+ return -1, err
+ }
+ defer image.Close()
+
+ id, err := image.GetDataPoolId()
+ return id, err
+}
+
// checkValidImageFeatures check presence of imageFeatures parameter. It returns false when
// there imageFeatures is present and empty.
func checkValidImageFeatures(imageFeatures string, ok bool) bool {
@@ -1158,6 +1169,18 @@ func generateVolumeFromVolumeID(
}
err = rbdVol.getImageInfo()
+ dataPoolId, getDataPoolErr := rbdVol.GetDataPoolId()
+ if dataPoolId != util.InvalidPoolID && getDataPoolErr == nil {
+ dataPoolName, getDataPoolErr := util.GetPoolName(rbdVol.Monitors, cr, dataPoolId)
+ if getDataPoolErr == nil {
+ rbdVol.DataPool = dataPoolName
+ } else {
+ log.ErrorLog(ctx, "failed to get data pool name: %w", getDataPoolErr)
+ }
+ } else if getDataPoolErr != nil {
+ log.ErrorLog(ctx, "failed to get data pool id: %w", getDataPoolErr)
+ }
+
return rbdVol, err
}
diff --git a/vendor/github.com/ceph/go-ceph/rbd/rbd.go b/vendor/github.com/ceph/go-ceph/rbd/rbd.go
index b05b14397..2dea6c02f 100644
--- a/vendor/github.com/ceph/go-ceph/rbd/rbd.go
+++ b/vendor/github.com/ceph/go-ceph/rbd/rbd.go
@@ -1008,6 +1008,14 @@ func (image *Image) SetSnapshot(snapname string) error {
return getError(C.rbd_snap_set(image.image, cSnapName))
}
+func (image *Image) GetDataPoolId() (int64, error) {
+ if err := image.validate(imageIsOpen); err != nil {
+ return -1, err
+ }
+
+ return int64(C.rbd_get_data_pool_id(image.image)), nil
+}
+
// GetTrashList returns a slice of TrashInfo structs, containing information about all RBD images
// currently residing in the trash.
func GetTrashList(ioctx *rados.IOContext) ([]TrashInfo, error) {
--
2.46.0
@iPraveenParihar any update on this?
@Champ-Goblem, Thanks for opening up the issue.
Since DataPool
is not set (should be in generateVolumeFromVolumeID()
?) when doing snapshot/clone operations the images will not be in the required data pool.
Can we use GetString(RbdImageOptionDataPool) to get datapool value as we are setting the datapool value when creating an image? https://github.com/ceph/ceph-csi/blob/e19c28927f27383b28596370a3ae7e1585d29bee/internal/rbd/rbd_util.go#L1553-L1559
@nixpanic WDYT?
For reference, I asked about the following in the Ceph Slack: https://ceph-storage.slack.com/archives/C05522L7P60/p1723555634369239
Describe the bug
When creating snapshots of erasure-coded RBD volumes the data is stored in the replicated metadata pool.
From the investigation into the ceph-csi code, the data pool option is not set during
CreateSnapshot
->GenVolFromVolID
. This instance of rbdVolume is passed todoSnapshotClone
, which callscreateRBDClone
. During the clone process:cloneRbdImageFromSnapshot
copies the RBD image options from thecloneRbdVol
, created based on the rbdVolume generated fromGenVolFromVolID
. These options are passed tolibrbd.CloneImage
and thus copies the data into the erasure-coded metadata pool, rather than keeping the data in the original erasure-coded pool.This can be seen when inspecting the image in rbd with
rbd info
, the image is missing the data_pool field:Compare this to the original image:
Steps to reproduce
Steps to reproduce the behavior: Snapshot an erasure-coded volume
Expected behavior
The snapshot should remain within the original erasure-coded pool and not have data copied to the replicated metadata pool.