ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.26k stars 536 forks source link

CSI snapshots of an erasure-coded-backed volume stores snapshot data in the metadata pool #4761

Open Champ-Goblem opened 1 month ago

Champ-Goblem commented 1 month ago

For reference, I asked about the following in the Ceph Slack: https://ceph-storage.slack.com/archives/C05522L7P60/p1723555634369239

Describe the bug

When creating snapshots of erasure-coded RBD volumes the data is stored in the replicated metadata pool.

From the investigation into the ceph-csi code, the data pool option is not set during CreateSnapshot->GenVolFromVolID. This instance of rbdVolume is passed to doSnapshotClone, which calls createRBDClone. During the clone process:

cloneRbdImageFromSnapshot copies the RBD image options from the cloneRbdVol, created based on the rbdVolume generated from GenVolFromVolID. These options are passed to librbd.CloneImage and thus copies the data into the erasure-coded metadata pool, rather than keeping the data in the original erasure-coded pool.

This can be seen when inspecting the image in rbd with rbd info, the image is missing the data_pool field:

rbd info -p ec-metadatapool-us-east-1b csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b
rbd image 'csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b':
    size 30 GiB in 7680 objects
    order 22 (4 MiB objects)
    snapshot_count: 1
    id: 1933d7e8dc4d58
    block_name_prefix: rbd_data.1933d7e8dc4d58
    format: 2
    features: layering, deep-flatten, operations
    op_features: clone-child
    flags: 
    create_timestamp: Sat Aug 10 09:40:32 2024
    access_timestamp: Sat Aug 10 09:40:32 2024
    modify_timestamp: Sat Aug 10 09:40:32 2024
    parent: ec-metadatapool-us-east-1b/csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c@a0684712-eadf-40b9-8d33-e46711a19fc2
    overlap: 30 GiB

Compare this to the original image:

rbd image 'csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c':
    size 30 GiB in 7680 objects
    order 22 (4 MiB objects)
    snapshot_count: 88
    id: 17e84e87be3a0c
    data_pool: ec-blockpool-us-east-1b
    block_name_prefix: rbd_data.17.17e84e87be3a0c
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool, operations
    op_features: clone-parent, snap-trash
    flags: 
    create_timestamp: Thu Aug  8 14:44:59 2024
    access_timestamp: Mon Aug 12 10:24:16 2024
    modify_timestamp: Thu Aug  8 14:44:59 2024

Steps to reproduce

Steps to reproduce the behavior: Snapshot an erasure-coded volume

Expected behavior

The snapshot should remain within the original erasure-coded pool and not have data copied to the replicated metadata pool.

Champ-Goblem commented 1 month ago

I have temporarily patched this problem internally so we can test if the change helps with our OSD disk usage. I have generated a patch for the changes I made and I will upload it here for reference, in case it helps with the final fix.

From f3262f9fd84237c8617cd78f2a8792b871bc9ff2 Mon Sep 17 00:00:00 2001
From: Champ-Goblem <cameron@northflank.com>
Date: Wed, 14 Aug 2024 12:13:10 +0100
Subject: [PATCH] csi-snapshot: fetch data-pool info from rbd_get_data_pool_id
 api call

---
 internal/rbd/controllerserver.go          |  2 ++
 internal/rbd/rbd_util.go                  | 23 +++++++++++++++++++++++
 vendor/github.com/ceph/go-ceph/rbd/rbd.go |  8 ++++++++
 3 files changed, 33 insertions(+)

diff --git a/internal/rbd/controllerserver.go b/internal/rbd/controllerserver.go
index 4f07ce5f0..8e7d7b301 100644
--- a/internal/rbd/controllerserver.go
+++ b/internal/rbd/controllerserver.go
@@ -1328,6 +1328,8 @@ func (cs *ControllerServer) doSnapshotClone(
    f := []string{librbd.FeatureNameLayering, librbd.FeatureNameDeepFlatten}
    cloneRbd.ImageFeatureSet = librbd.FeatureSetFromNames(f)

+   cloneRbd.DataPool = parentVol.DataPool
+
    err := cloneRbd.Connect(cr)
    if err != nil {
        return cloneRbd, err
diff --git a/internal/rbd/rbd_util.go b/internal/rbd/rbd_util.go
index 31b330b4d..ad3e07ce1 100644
--- a/internal/rbd/rbd_util.go
+++ b/internal/rbd/rbd_util.go
@@ -561,6 +561,17 @@ func (ri *rbdImage) isInUse() (bool, error) {
    return len(watchers) > defaultWatchers, nil
 }

+func (ri *rbdImage) GetDataPoolId() (int64, error) {
+   image, err := ri.open()
+   if err != nil {
+       return -1, err
+   }
+   defer image.Close()
+
+   id, err := image.GetDataPoolId()
+   return id, err
+}
+
 // checkValidImageFeatures check presence of imageFeatures parameter. It returns false when
 // there imageFeatures is present and empty.
 func checkValidImageFeatures(imageFeatures string, ok bool) bool {
@@ -1158,6 +1169,18 @@ func generateVolumeFromVolumeID(
    }
    err = rbdVol.getImageInfo()

+   dataPoolId, getDataPoolErr := rbdVol.GetDataPoolId()
+   if dataPoolId != util.InvalidPoolID && getDataPoolErr == nil {
+       dataPoolName, getDataPoolErr := util.GetPoolName(rbdVol.Monitors, cr, dataPoolId)
+       if getDataPoolErr == nil {
+           rbdVol.DataPool = dataPoolName
+       } else {
+           log.ErrorLog(ctx, "failed to get data pool name: %w", getDataPoolErr)
+       }
+   } else if getDataPoolErr != nil {
+       log.ErrorLog(ctx, "failed to get data pool id: %w", getDataPoolErr)
+   }
+
    return rbdVol, err
 }

diff --git a/vendor/github.com/ceph/go-ceph/rbd/rbd.go b/vendor/github.com/ceph/go-ceph/rbd/rbd.go
index b05b14397..2dea6c02f 100644
--- a/vendor/github.com/ceph/go-ceph/rbd/rbd.go
+++ b/vendor/github.com/ceph/go-ceph/rbd/rbd.go
@@ -1008,6 +1008,14 @@ func (image *Image) SetSnapshot(snapname string) error {
    return getError(C.rbd_snap_set(image.image, cSnapName))
 }

+func (image *Image) GetDataPoolId() (int64, error) {
+   if err := image.validate(imageIsOpen); err != nil {
+       return -1, err
+   }
+
+   return int64(C.rbd_get_data_pool_id(image.image)), nil
+}
+
 // GetTrashList returns a slice of TrashInfo structs, containing information about all RBD images
 // currently residing in the trash.
 func GetTrashList(ioctx *rados.IOContext) ([]TrashInfo, error) {
-- 
2.46.0
Champ-Goblem commented 3 weeks ago

@iPraveenParihar any update on this?

iPraveenParihar commented 2 weeks ago

@Champ-Goblem, Thanks for opening up the issue.

Since DataPool is not set (should be in generateVolumeFromVolumeID()?) when doing snapshot/clone operations the images will not be in the required data pool.

Can we use GetString(RbdImageOptionDataPool) to get datapool value as we are setting the datapool value when creating an image? https://github.com/ceph/ceph-csi/blob/e19c28927f27383b28596370a3ae7e1585d29bee/internal/rbd/rbd_util.go#L1553-L1559

@nixpanic WDYT?