Open git-hulk opened 1 year ago
@git-hulk Let me try to implement this feature.
@torwig Thanks a lot. For this issue, I am not sure if it's good to compress the db into a single object and then upload it.
@git-hulk Thank you for your tip. I'm going to think about the whole process and suggest something like "high-level design" and "possible implementation(s)" before actually starting implementing so we can discuss all key things.
🆒 Thanks
Hi @torwig are you still working on this issue? If not @git-hulk could I take it up?
@chrisxu333 Currently, I can't dedicate my time to this issue. If you wish to run it, @git-hulk will reassign it to you.
Initialize S3/GCS etc would be a bit tricky, maybe opendal C SDK would help: https://github.com/apache/incubator-opendal . It would be also ok for testing in local machine. Other tools in C++ is also welcomed. Since s3 credit config is a bit tricky, I think we'd better use thirdparty library at first.
Also, the dependency would be a bit complex for using object SDK, we'd better make clear what the config would like. You can try to investigate how other system does that:
To be honest, I didn't think clearly about whether this feature should be put inside Kvrocks. Perhaps implementing a new dedicated tool for the backup like ClickHouse is a good idea.
🤔 ClickHouse can read from remote S3, so I think it's able to upload or backup to s3.
However, TiKV only supports a br here. (See: https://tikv.org/docs/6.5/concepts/explore-tikv-features/backup-restore-cn/ ). Maybe we can considering using the sameway. It can also not bring any size amplify to our binary and hide the risk of unmature implemention.
@mapleFU Thanks for your great references!
if it's good to compress the db into a single object and then upload it.
Why not?
Create, then compress the backup, and then upload the single file
Encryption of the backup file(s) will be nice too. Right now we are planning to mount the PVC volume in our Kubernetes cluster as a cronjob, make an encrypted archive and upload it to S3.
But yes, the fact that the backup is first generated on the same volume can be problematic (lack of space etc).
But yes, the fact that the backup is first generated on the same volume can be problematic (lack of space etc).
Kvrocks allows changing the backup dir via config set backup-dir
. And it's now using the rocksdb checkpoint as the backup which will use the hard link when copying files. Perhaps you can remove the backup after syncing to S3?
Hi, I'm Xuanwo from the OpenDAL communiy. I'm watching on development of kvrocks for sometime and find this issue interesting.
As you may know, OpenDAL offers a unified data access layer, empowering users to seamlessly and efficiently retrieve data from diverse storage services. I feel like opendal will be a good fit for kvrocks to implement backup to/from storage services like s3/gcs/azblob/...
Since kvrocks code base is mainly cpp, there are two ways to integrate with opendal:
Sorry for not reading the thread carefully. I found @mapleFU already mentioned opendal.
@Xuanwo Here I think the performance is not the critical reason and we may not enable some advance feature about threading, I think opendal as a backend of RocksDB Env would be a goodway for solving both this and backup to hdfs
opendal as a backend of RocksDB Env
It looks like a good idea. I don't have much understadning of RocksDB Env so I don't know if it's possible with a simple wrapper.
My friend @leiysky told me that rocksdb env requires append
support which is not widely supported by object storage services (at least s3 doesn't). And even for services that support append, It might be not good for append many small chunks. This could be an issue.
Note: OpenDAL itself does support append
but s3 doesn't.
After some discussion, maybe design some new syntax and using another thread / process to upload Backup in Local FileSystem to HDFS/S3 is also a way. This avoid the complex logic of intereact with rocksdb::Env
, and could be done in a separate way
Search before asking
Motivation
Most users demand a backup of the DB dir, but we can only support backup in the local file system. And it may cause trouble if we didn't reserve enough disk space. It would be better if we can put the backup on cloud storage like S3/GCS/...
Solution
No response
Are you willing to submit a PR?