facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.7k stars 6.33k forks source link

Add support for taking snapshot of a column family and creating column family from a given CF snapshot. #3469

Open sachja opened 6 years ago

sachja commented 6 years ago

When building a distributed key value store on top of rocksdb a useful feature is to be able to migrate a column family from one node to another. For this it will be useful to have APIs to a. Generate a snapshot of a column family. This could be implemented by flushing the memtable and giving handles to all the sstables of the column family with their metadata information. b. Create a column family from a snapshot. The snapshot can be transferred by the caller by copying the files and the metadata information and then it can call this API to create a column family with identical state as the sender.

Pls let us know if something for above already exists or can be extended or if anyone is planning to work on this?

siying commented 6 years ago

We don't have exact API for that. It's something interesting to have, but we don't have near-term plan to add this feature. If you are interested, feel free to contribute the code and we'll be happy to review and merge it.

siying commented 6 years ago

If you don't have further comments, I'm going to close the issue.

sachja commented 6 years ago

Hi We started working on the support for this. One place we are stuck is that the column family name and ID may not be same on the sender and receiver. It looks like the sst files have the column family ID in its table properties which is checked both during ingest and maybe during lookup. Our question is why we need to embed the column family ID info in the sst file at all since I assume the manifest file will have the info about the SST files for each column family. Are there just a few places where the column family info in the SST file checked which can be removed with an option.

Thanks for your help.

sachja commented 6 years ago

@vpallipadi @snaeni

vpallipadi commented 6 years ago

@siying Can you please comment on this change. Specifically, we are ignoring the column family id of the sst file during this import and Version sequence number is updated if the imported sequence number is higher.

The idea is to import sst files into a column family on an active db (that may have other active column families), preserving levels and sequence numbers from the source cf.

We have been testing this change internally for couple of weeks and this part seems to be working fine. We still have issues on the source side where we are preparing the sst files for import. (1) With DisableFileDeletions and copying of sst files over and EnableFileDeletions, as in #3609 and (2) a potential race with DisableFileDeletions and background compactions.

vpallipadi commented 6 years ago

@siying Any comments on this change - https://github.com/vpallipadi/rocksdb/commit/50b517feea55f458a74a831a7687b0ec524ca202

Let me know if I need to copy anyone else to get an initial feedback on the approach. Thanks.

siying commented 6 years ago

I'll take a look.