cloudnativecube / octopus

14 stars 2 forks source link

clickhouse与hbase融合 #131

Open mdianjun opened 3 years ago

mdianjun commented 3 years ago

clickhouse自带rocksdb表引擎,是否可以替代hbase的使用场景?

https://github.com/facebook/rocksdb/issues/8521

rocksdb资料:

Related issue: https://github.com/cloudnativecube/octopus/issues/136

godliness commented 3 years ago

hbase内部原理: https://cloud.tencent.com/developer/news/758459

godliness commented 3 years ago

Altinity对于rocksdb的支持,并介绍了为什么在某些场景比mergetree要快以及测试结果: https://kb.altinity.com/engines/altinity-kb-embeddedrocksdb-and-dictionary

godliness commented 3 years ago

rocksdb与hbase的不同:

https://www.quora.com/How-does-RocksDB-compare-with-HBase

https://juejin.cn/post/6844903549931880461

rocksdb column family 介绍:

https://github.com/facebook/rocksdb/wiki/Column-Families#implementation

mdianjun commented 3 years ago

rocksandra如何实现在rocksdb基础上支持大宽表?

https://github.com/Instagram/cassandra/tree/rocks_3.0/src/java/org/apache/cassandra/rocksdb

godliness commented 3 years ago

rocksdb是一个key-value的存储引擎,接口也仅仅是key value的put, get, delete等等,根本没有表,列的概念,如果我们需要基于rocksdb做所谓的宽表,那我们需要基于rocksdb的接口,上层封装一下,目前clickhouse的内嵌rocksdb引擎已经集成了这些接口,但是还不够完善,例如它并没有将column family的概念引入,我们接下来可以借鉴Hbase, Rocksandra等开源数据库来对ck的内嵌rocksdb引擎进行有针对性的优化。

目前ck内嵌rocksdb引擎在个别场景下性能已经比mergetree快了(二分查找 vs 稀疏索引): https://kb.altinity.com/engines/altinity-kb-embeddedrocksdb-and-dictionary 不过还可以继续优化,提高其性能。

godliness commented 3 years ago

Rocksandra:

https://instagram-engineering.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589

https://github.com/Instagram/cassandra

instagram基于原生cassandra做的修改,将rocksdb集成到了cassandra中,当时是由于cassandra 的gc延迟对他们线上读写造成了很高的延迟:

Apache Cassandra is a distributed database with it’s own LSM tree-based storage engine written in Java. We found that the components in the storage engine, like memtable, compaction, read/write path, etc., created a lot of objects in the Java heap and generated a lot of overhead to JVM.

所以他们采用rocksdb来替代本身的LSM结构,从而性能上得到了很大的提升。

主要做的修改有三:

  1. 因为cassandra原生的存储引擎的紧耦合的,若融入rocksdb需要修改原生cassandra,增加存储api, 使其可以按照插拔式的轻松集成
  2. schema的支持,rocksdb就是纯粹的key-value存储引擎,无表,列的类型等概念,需要基于cassandra丰富的schema来接入rocksdb
  3. cassandra多节点之间数据的传输流需要基于rocksdb的方式进行改造(当增减节点时数据的re-balance)
godliness commented 3 years ago

hbase与cassandra(Rocksandra)对比,国外知名度cassandra远远高于hbase,hbase之所以兴起是因为hadoop的兴起,而spark目前已经接替了hadoop,hbase也随之式微了,我们真的要考虑hbase与clickhouse的融合吗?

https://zhuanlan.zhihu.com/p/344859872#:~:text=HBase%E6%9B%B4%E5%A4%9A%E7%9A%84%E6%98%AF,%E7%BA%A7%EF%BC%8C%E6%98%AF%E6%AF%94%E8%BE%83%E9%95%BF%E7%9A%84%E3%80%82