Open mdianjun opened 3 years ago
其他资料:
cassandra的特点:
cassandra vs hbase:
https://www.scnsoft.com/blog/cassandra-vs-hbase
Cassandra:
HBase:
共同点:
但是在实际项目中应用Cassandra和HBase的主要区别是这样的:Cassandra适用于具有复杂和/或实时分析功能的“始终在线”网络或移动应用程序和项目。但如果不急于得到分析结果(例如,做数据湖实验或创建机器学习模型),HBase可能是一个不错的选择。特别是如果您已经投资了Hadoop基础设施和技能集。
对于CK来融合类似Casandra以及Hbase这些类RocksDB的系统,我有以下一些想法:
CK目前是无主架构的,我觉得未来要真正上云,每个CK节点必须要拥有所有的元数据信息,除非我们自己研发给CK开发一个master, 但是这样会不会就跟社区走远了?另外一个优化可以是让CK的主键在全局有序,也可以进一步的优化分布式表的无用请求,未来CK打算将zookeeper放入CK内部是不是就实现了全局元数据存在本地了?
rocksdb与leveldb的关系以及内部结构详解:
https://daemondshu.github.io/2019/03/21/Programming/Data%20Structure/LevelDB_RocksDB/
rocksdb的columnfamily底层物理隔离,一个memtable多个sst文件为一个columnfamily.
目前clickhouse的rocksdb表引擎已经支持了对于rocksdb库的配置变更,包括columnfamilyoptions的配置(仅单CF);增加system.rocksdb系统表,对rocksdb表引擎进行必要的系统指标记录: https://github.com/ClickHouse/ClickHouse/pull/26821
不过目前看我并没有发现clickhouse的rocksdb引擎对于多columnfamilies的支持,但rocksandra是有的,可以看下代码借鉴,以及对标一下hbase, rocksandra对于rocksdb的扩展的经验和意图。
clickhouse的rocksdb:
rocksandra的rocksdb:
hbase/cassandra读写性能比较:
When the comparison is drawn between Apache Cassandra performance and Apache HBase performance, it is done on the front of read and write capability. Write: Both HBase and Cassandra’s on-server write paths are fairly alike. There are some differences though which makes Cassandra better, like the difference in names for the data structure and the fact that HBase does not write to log and then cache simultaneously. Read: If you are looking for consistent and fast reads, you should go with HBase. Since it writes on only one server, there is never the need of comparison between the various nodes’ data versions. Even though Cassandra can handle over 129,000 reads in one second, the reads are targeted and there are high probability of them being inconsistent.
(持续更新)
单机版功能:
高级功能:
分布式版功能: