put data is very slowly

Huawei-Hadoop / hindex

Secondary Index for HBase

Apache License 2.0

591 stars 286 forks source link

put data is very slowly #35

Open rexwong opened 10 years ago

rexwong commented 10 years ago

hi, i'm a Chinese coder. and i use your project for secondary index . but put data is very slower than native hbase.

test environment：

put data size：more than 100 million
cluster：1 hmaser ；4 regionserver
cluster setting:
- heapsize：12g，
- memstore：
- flushSize:256m
- lowerLimit:0.38
- upperLimit:0.4
index table setting：one index,and one column in this index
pre create 128 regions
put data use 8 reduces，and put 1000 rows of each
1. test result
  - have index：5hrs, 14mins, 13sec
  - no index：1hrs, 12mins, 23sec

so，thanks to help me。

chrajeshbabu commented 10 years ago

Are you using bulkload importtsv to load data? What is each record size?

bq, put data use 8 reduces，and put 1000 rows of each You mean 8 reducers in bulkload?

anoopsjohn commented 10 years ago

I think bulk load is been used. There is scope for improvement in case of bulk load. We have open issue for that already Rajesh?

chrajeshbabu commented 10 years ago

Yes Anoop, we have some improvement actions, Once performance testing is done we can commit here that. But not expecting this much degradation. There may be some other problems related to mapreduce?

rexwong commented 10 years ago

thanks very very much.

The user table have one cf and 5 column qualifier. it's less then 1kb. The index table have one index,and in column qualifier in this index.

I just use hadoop mapreduce to put data via hbase's Put in reduce. when i put data to hbase，that flushing index table‘s memstore to disk is very frequently. Is lock index region and user region?

and i find bulk load package in hindex. So is it better?

rexwong commented 10 years ago

@anoopsjohn thanks to answer.

you said bulk load that is bulk load package in hindex, right?

anoopsjohn commented 10 years ago

Yes I was thinking that you are using the tool in the bulk load package in HIndex. not? How is your mappers and reducers?

rexwong commented 10 years ago

the mappers get data from the other hadoop cluster, and reducers use hbase native Put to insert data. and i'll try to tool in the bulk load package.