Closed whjiang closed 8 years ago
Perhaps the different types of storage could share a common API or aspects of collections so AppStorage doesn't need to change every time a new type of storage is added. Also at least with JarStore see #1204 where lack of streaming causes Master to exit.
Should AppStorage contain write, read methods?
Hi @kkasravi, I agree that sharing common API is one goal. This is why the checkpoint store and offset store in current codebase are unified.
The read/write methods are for KVStore instead of AppStorage. AppStorage is a quite general concept. It includes everything an application needs for storage (except connector).
+1 for the ideas. The difficulty will be how to make the storage implementation plugable. So that user can choose the storage system they want, like casandra, hdfs, or hbase...
maybe i lose something, why not just base on akk persitence, it now support hbase/casandra/... plug-ins.
@netcomm
I am still trying to grasp the benefits and limitation of akka persistence.
Here are some consideration to select a persistence layer:
I need further investigation on akka persistence.
hi @clockfly
. As a streaming platform, we want the classpath to be as short as possible. We don't want the user app to fail if it depends on a different version of HBase or Cansandra. Using akka persistence will create a class path dependency on external storage system. ----- classpath? can more detail
. Akka persistence requires sequential read/write. While for our case, we need a KV store for to store application data like clock value. ----- yes, because Akka persistence implemention host on actor, so inside actor it is sequential, but you can choose leveldb or whatover more power persistence plug-ins, maybe just see persistence layer, it is not sequential.
. For the Master layer, we use akka cluster to implement master HA, and will sync the data between masters. I don't know how the data replication works in akka persistence. ----- i find gearpump have used akkas CRDTS. because CRDTS in memory,then you can just save them use akka persistence.
. The akka persistence serialization framework is not very performant. ----- sorry,No actual data not say maybe i loss something,please correct it.
@netcomm
----- classpath? can more detail
For example, if you want to run a job over gearpump, the job itself has its own java classpath, gearpump engine platform has its own classpath, these two may conflict as the app executor JVM contains both classpath. So in gearpump, we shaded most dependencies. By if we introduce heavy library like casandra, then it maybe too difficult to do the classpath shading.
----- yes, because Akka persistence implemention host on actor, so inside actor it is sequential, but you can choose leveldb or whatover more power persistence plug-ins, maybe just see persistence layer, it is not sequential.
I am not familiar with akka persistence. Can you give us some doc link about "non-sequential" api? Can you find some examples?
i find gearpump have used akkas CRDTS. because CRDTS in memory,then you can just save them use akka persistence.
It will be great if akka-persistence can do this.
. The akka persistence serialization framework is not very performant.
----- sorry,No actual data not say
There are two use cases for us. The first one is store some cluster data that not requiring high performance, for this part, I think may be we can use akka-persistence, if akka-persistence can be usaged as a generic KV storage as you said.
The second use case for use require high performance writing. We need to store application level checkpoint files. In the early old time of Gearpump, we tested the akka serialization performance, not very good. That is why we coin our own serialization implementation. I think if akka-persistence layer has strong binding to akka-serialization framework, then it may becomes a problem. Probably the binding is not that strong, then we can bypass it.
We are interested with the "non-sequential" usage of akka persistence, please give us some pointers or examples, thanks.
@clockfly
in the http://doc.akka.io/docs/akka/2.4.1/scala/persistence.html, maybe you can find some useful info,.e.g. "The persistAsync method provides a tool for implementing high-throughput persistent actors. It will not stash incoming Commands while the Journal is still working on persisting and/or user code is executing event callbacks." if this is not what you want, please let me know,maybe i can help something.
@netcomm, I opened #1822, are you interested in writing a demo project for akka-persistence?
ok,i try it.
In general, a Gearpump application requires following storage support:
The general idea is:
The draft of this storage looks like (quite initial, tentative to change):