Knight-Wu / articles

博客
3 stars 1 forks source link

hbase #5

Open Knight-Wu opened 6 years ago

Knight-Wu commented 6 years ago

region架构 region架构

hdfs文件映射 hdfs文件映射.jpg

hbase 特点

rowKey 的设计

hbase 命令

// scan meta for the table, get region info
 scan 'hbase:meta',{FILTER=>"PrefixFilter('table')"}

//  filterlist
f_keyonly = org.apache.hadoop.hbase.filter.KeyOnlyFilter.new();
f_firstkey = org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new();
flist = org.apache.hadoop.hbase.filter.FilterList.new([f_keyonly, f_firstkey]);
scan 'mytable', {STARTROW => 'myStart', ENDROW => 'myEnd', FILTER =>  flist }

// rowcount
hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

Usage: RowCounter [options] 
    <tablename> [          
        --starttime=[start] 
        --endtime=[end] 
        [--range=[startKey],[endKey]] 
        [<column1> <column2>...]
    ]

// 重启
sh hbase-daemon.sh restart regionserver

hbase 架构

RS下有多个region, 根据rowkey的分布均匀分布在多个region; 一个table的数据分布在多个region, 一个CF对应一个store, 一个memstore, 一个store下面对应多个storeFile,一个storeFile由多个hdfs的block组成

hbase 读流程

  1. 从zk上获取hbase:meta表的所在的RS,可以通过zookeeper命令(get //meta-region-server)查看该节点信息
  2. 从meat表获取row所在的region ,并且meta表的信息会被客户端加载到缓存 可以用 scan 'hbase:meta' 来获取该表的信息.meta表的结构
  3. region信息被更新, 例如split等后, 会更新meta表

hbase写流程

Hbase 写入流程(网易-范欣欣)

hive数据批量写入hbase

问题

Memstore Flush

参考自 link

HBase会在如下几种情况下触发flush操作,需要注意的是MemStore的最小flush单元是HRegion而不是单个MemStore。可想而知,如果一个HRegion中Memstore过多,每次flush的开销必然会很大,因此我们也建议在进行表设计的时候尽量减少ColumnFamily的个数。

Compaction

如果不满足major compaction条件,就必然为minor compaction,HBase主要有两种minor策略:RatioBasedCompactionPolicy和ExploringCompactionPolicy,

2.Major操作是对Region下的HStore下的所有StoreFile执行合并操作,最终的结果是整理合并出一个文件。

There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction will pick up all the StoreFiles in the Store and in this case it actually promotes itself to being a major compaction.

After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. Caution: major compactions rewrite all of the Stores data and on a loaded system, this may not be tenable; major compactions will usually have to be done manually on large systems.

region split

关闭 auto split

建表时设置拆分策略为 ConstantSizeRegionSplitPolicy, 并指定最大的region size 为100 GB,所有的store files size 总和超过才拆分, 若想导数快点, 则可先预分区, 待导数完毕后再关闭自动拆分.

手动触发major compaction

hbase-majoralltable.sh

zookeeper在hbase中的作用

dataType

hbase important configuration