clyso / chorus

s3 multi provider data lifecycle management
Apache License 2.0
54 stars 6 forks source link

Migration to different bucket name #27

Open hit-rich opened 3 months ago

hit-rich commented 3 months ago

We have a ceph cluster where some of our S3/rgw buckets are in a pool that we need to get them out of. We've tested Chorus and it works great for migrating to a different gateway, but in this case we need to migrate the data from one bucket to another on the same gateway. As far as I can tell this isn't possible with Chorus at the moment; there's a single bucket name in the replication commands which gets used on both source and destination. If it is possible to make this work then I think it would be a useful feature for people in a situation like us where we need to migrate the content to a different bucket name.

arttor commented 3 months ago

You are correct. Unfortunately, it is not possible to do this with Chorus today.

I think the ability to set a custom name for the destination bucket in the replication policy would add flexibility to the Chorus API. However, I also think that it would be hard to support using the same backend for both the source and destination. If we want to support this, we must ensure that it works properly across all components. I see some difficulties in integrating this feature into the Chorus proxy component. For example, the proxy would have to block all requests to follower buckets and filter out follower buckets from the ListBuckets API. While it is possible to do so, I am not sure that it is a very popular use case, and it may not be worth adding more complexity to the project for it.

Could you provide more details about your use case? If the buckets are in the same gateway, then server-side copy could be used, and data transfer should be fast. In this case, it may be a good idea to use RClone directly because it supports server-side copy for the RGW backend: https://rclone.org/docs/#server-side-copy.

hit-rich commented 2 months ago

Thanks for the info. rclone server side copy seems to solve one of my problems so that's great. I was not aware of this functionality when I tested rclone previously so appreciate the pointer there.

I would still be really keen to get the different bucket name for destination implemented if possible as this would help with one part of my problem that remains. We have multisite setup with two zones and have some buckets we need to copy back from secondary site to primary. We are unable to use built in sync due to various factors. Because the bucket exists already at both sites we need to copy it to a different bucket name.

arttor commented 2 months ago

Thanks for update, we gave this feature a second thought and we think that it is makes sense to invest time in refactoring and allow sync to custom bucket name in the same or different storage. We probably can release something next month but i am not sure.

hit-rich commented 2 months ago

Awesome, thanks for your response. I look forward to testing it when it comes through.

ddeniskin commented 2 months ago

@arttor Hi Artem, I'm also highly interested in using different bucket names for source and destination. Hope it will be possible for you to work on this.

arttor commented 2 months ago

here is relevant feature-branch but it is far from being complete. https://github.com/clyso/chorus/compare/main...same-backend-replication i think that we will have something working in August. And we will have to develop some migration process to the new version because it will be hard to make it fully compatible with existing one.