dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.47k stars 926 forks source link

Periodical Snapshotting #161

Closed thevaizman closed 2 years ago

thevaizman commented 2 years ago

Would love to see auto-snapshotting of the RDB file in DF. Currently I can use SAVE/BGSAVE but the option to configure automatic snapshotting - i.e SAVE 60 1000 is non-existent. Would love to replace redis with DF but I don't want to take the risk of data loss or the overhead of having an external job to take care of periodically dumping the data to disk.

Is there a plan to support this feature in the future?

romange commented 2 years ago

yes, we can support it, though unlikely we gonna use the same spec as redis.

btw, DF saves timestamped files by default, though it's possible to override it, and use a single snapshot file like with Redis. What would you choose? timestamped files will require some sort of garbage collection configured externally, otherwise you will find yourself out of disk space. In addition, if you run your Redis/DF in a cloud, I might want to upload your rdb snapshots to cloud storage.

thevaizman commented 2 years ago

Thanks for your response!

I actually DON'T like the redis spec for the periodical snapshotting, I just used it as an example to explain the feature 😄 I would prefer a runtime arg that I can supply to DF that will indicate when to dump to disk (e.g --snapshot-interval=60 will dump every 60 seconds.

In terms of naming the snapshots, I personally choose to override the timestamps as I only want the most recent dump so I run my instance of DF with --dbfilename=dump.rdb. You do make a valid point about saving multiple snapshots in a cloud storage to enable restoring to some point in time, but I'm taking a wild guess here that DF isn't going to actually support the upload operation to the cloud any time soon (and rightly so 😉 ) and so the user will have to spin up some external component that will automate this. Therefore, I don't see any point in using the timestamps on DF's end but just let the user handle the naming externally.

romange commented 2 years ago

I think that the simplest and most versatile approach would be to adopt a glob-based spec. For example "10:00" would match 10am, but "*:00" would match every hour. I think periodic configuration does not fit the use case, where one wants to snapshot during low-load hours.

thevaizman commented 2 years ago

Sounds reasonable to me. Although you could argue that during high-load hours there's a bigger risk of losing data due to crashes/failures, which is why I tend to like the periodic configuration of redis. But as long as we can use something like *:00, I believe this is sufficient for most use-cases.

romange commented 2 years ago

👍🏼

So, the task is:

  1. To introduce a flag save_schedule or similar in server_family.cc
  2. If the flag is not empty, to parse it on a startup and see if it fits the glob spec to match HH:MM 24h time. We probably should not crash on incorrect value but output error log and ignore.
  3. If everything is ok we should start a fiber that sleeps in a loop every 20s. (20s is enough detailed so we could catch every minute when we drift).
  4. once the fiber wakes it should check for the current time and match it with the spec. if it fits, call DoSave() function.
  5. DoSave requires a transaction object. You can create in the calling fiber. See Reload(...) function in debugcmd.cc for example.
  6. I do not see how we can test it easily in unit tests, unfortunately. However, I introduced a pytest framework under tests/pytest. We should add a test there that checks this behavior. However, this item probably depends on #199 .
Nike682631 commented 2 years ago

@romange Is this issue available to take up?

romange commented 2 years ago

This one requires deep knowledge of DragonflyDB architecture to do correctly. Lets start with other issues for now.

Nike682631 commented 2 years ago

Sure

kaiserdan commented 1 year ago

How would you snapshot every 15 minutes using this format?

romange commented 1 year ago

@kaiserdan we just recently introduced a new flag: see https://github.com/dragonflydb/dragonfly/pull/1599 and https://github.com/dragonflydb/dragonfly/issues/1590

we will document it soon, see https://github.com/dragonflydb/documentation/issues/129