cloudnativecube / octopus

14 stars 2 forks source link

扩展rocksdb engine,对标hbase的功能 #136

Open mdianjun opened 3 years ago

mdianjun commented 3 years ago

(持续更新)

单机版功能:

高级功能:

分布式版功能:

mdianjun commented 3 years ago

其他资料:

mdianjun commented 3 years ago

cassandra的特点:

https://cassandra.apache.org/_/cassandra-basics.html

godliness commented 3 years ago

cassandra vs hbase:

https://www.scnsoft.com/blog/cassandra-vs-hbase

Cassandra:

HBase:

共同点:


对于CK来融合类似Casandra以及Hbase这些类RocksDB的系统,我有以下一些想法

  1. 给目前已经存在的CK表引擎,EmbeddedRocksDB Engine 我们可以通过分布式表的方式来查询这些表引擎,分布式表根据ZK中对应的range区间来分辨具体查询哪个shard,这样有个问题是CK目前主键在所有shards上并不是全局有序的,这点和hbase不一样,所以前期只能所有shard都查。
  2. 给EmbeddedRocksDB表引擎实现ReplicatedEmbeddedRocksDB的功能,确保数据的高可用性。
  3. 如果新添加shard, 如何做到数据的rebalance? hbase中hmaster会周期的来进行region在所有regionservers上的balance检查,但是CK是无主的,这点也和后期的上云密切联系,当计算存储分离后,每个shard中若包含所有的元数据?就不需要rebalance了。
  4. 给EmbeddedRocksDB实现可配置的Settings方案,目前官方就是实现了简单的读写,连ColumnFamliy都不支持,更别说其他详细的RocksDB的配置了。( 官方已经添加:https://github.com/cloudnativecube/octopus/issues/136#issuecomment-893195102

CK目前是无主架构的,我觉得未来要真正上云,每个CK节点必须要拥有所有的元数据信息,除非我们自己研发给CK开发一个master, 但是这样会不会就跟社区走远了?另外一个优化可以是让CK的主键在全局有序,也可以进一步的优化分布式表的无用请求,未来CK打算将zookeeper放入CK内部是不是就实现了全局元数据存在本地了?

godliness commented 3 years ago

rocksdb与leveldb的关系以及内部结构详解:

https://daemondshu.github.io/2019/03/21/Programming/Data%20Structure/LevelDB_RocksDB/

rocksdb的columnfamily底层物理隔离,一个memtable多个sst文件为一个columnfamily.

godliness commented 3 years ago

目前clickhouse的rocksdb表引擎已经支持了对于rocksdb库的配置变更,包括columnfamilyoptions的配置(仅单CF);增加system.rocksdb系统表,对rocksdb表引擎进行必要的系统指标记录: https://github.com/ClickHouse/ClickHouse/pull/26821

不过目前看我并没有发现clickhouse的rocksdb引擎对于多columnfamilies的支持,但rocksandra是有的,可以看下代码借鉴,以及对标一下hbase, rocksandra对于rocksdb的扩展的经验和意图。

clickhouse的rocksdb:

rocksandra的rocksdb:

https://appinventiv.com/blog/hbase-vs-cassandra/

mdianjun commented 3 years ago

hbase/cassandra读写性能比较:

When the comparison is drawn between Apache Cassandra performance and Apache HBase performance, it is done on the front of read and write capability. Write: Both HBase and Cassandra’s on-server write paths are fairly alike. There are some differences though which makes Cassandra better, like the difference in names for the data structure and the fact that HBase does not write to log and then cache simultaneously. Read: If you are looking for consistent and fast reads, you should go with HBase. Since it writes on only one server, there is never the need of comparison between the various nodes’ data versions. Even though Cassandra can handle over 129,000 reads in one second, the reads are targeted and there are high probability of them being inconsistent.

https://appinventiv.com/blog/hbase-vs-cassandra/