apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

Snapshot recovery with offloaded data #4942

Open vicaya opened 5 years ago

vicaya commented 5 years ago

Is your feature request related to a problem? Please describe.

  1. Backup and restore an existing cluster to/from the cloud
  2. Much cheaper DR without geo replication.
  3. Bootstrap pulsar clusters quickly for dev/test from the cloud.

Describe the solution you'd like

  1. create-offload-snapshot manually to create a recoverable snapshot in offload storage.
  2. set-offload-snapshot-interval to create recoverable points in offload storage.

Describe alternatives you've considered

Geo replication works for DR but too expensive.

Additional context

Don't need snapshot on bookkeepers if it makes the impl simpler.

feeblefakie commented 4 years ago

I think this is a must-have feature for stateful systems but why is this regarded as nice to have feature ? What would users do to recover from a disk failure in bookie node ?

sijie commented 4 years ago

What would users do to recover from a disk failure in bookie node ?

BookKeeper itself is already a replicated system. It will handle bookie failures and automatically re-replicate data to other available bookies.

Providing a snapshot for the running state of a distributed system is also a very challenging task, as it requires fair amount of work to make both metadata and data consistent when snapshot is happening.

yannick commented 2 years ago

maybe a much easier solution that would already solve quite a few cases: (mainly: development and hard disaster recovery where a bit of missing data is ok) since data can be written to tiered storage it would be nice if there would be an easy way to bootstrap a cluster with such data. e.g. for development you would just copy over from s3 a few days worth of data and bootstrap.
is this already possible ?