eligosource / eventsourced

A library for building reliable, scalable and distributed event-sourced applications in Scala
Apache License 2.0
828 stars 98 forks source link

Reusable snapshotting infrastructure based on Hadoop FileSystem abstraction #108

Closed krasserm closed 11 years ago

krasserm commented 11 years ago

This should be used by all journals.

krasserm commented 11 years ago

From a conversation:

  • Let all journals use the same snapshot storage API ... This is currently not the case: for example, LevelDB uses java.io.File whereas HBase uses org.apache.hadoop.fs.FileSystem.
  • The Hadoop FileSystem abstraction has implementations for local filesystem, HDFS, S3, FTP etc.
  • In a first step we let all journals use the Hadoop FileSystem abstraction and choose a reasonable default implementation for each journal. For example, RawLocalFileSystem for LevelDB, NativeS3FileSystem for DynamoDB etc. Applications can override these defaults, if needed. This way we could implement snapshotting for all journals very quickly. I just need to move the Hadoop FileSystem snapshot support from hbase to a common place and other journals can re-use it by providing different root URIs.
  • In a second steps we could introduce an extension to local snapshot storage. ... the local storage could serve as a cache and a background process/thread could do the the synchronization with a backup location (HDFS, S3, FTP etc).
  • The above strategy would make local caching an option and is an implementation detail of local snapshot storage (except that applications need to configure which backup storage they want to have).