Closed xuxc closed 10 years ago
just like before, if you add filter which include the index column, the hindex can get it. No need to add any code in client.
Need not make any client changes to make use of index. Internally we have filter evaluator to check whether to make use of index or not.
now i have used 3 filters to filter data ,and three cols all have index, but for 200W data ,it costs almost 40s ,i'd like to know whether indexes worked? \ @chrajeshbabu thank u.
Can u bit more clear your table schema and query? Total how many rows of data with you?
the table with a CF:"info",and 17 cols under "info", and i create index on every col at the time i creating the table,
when i use filter as this:
List
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("style_No"), CompareOp.EQUAL, Bytes.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("country_No"), CompareOp.EQUAL, Bytes.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);
hbase-site.xm is rightl:
It's take almost 40 seconds to get result for 2million rows. i'm afraid the indexes doesn't work....
So the total data on which scan happened is 2 million and data satisfying the condition is less than that? Or the actual fetched data is 2 million. Just trying to know the data size. How big cluster? Total how many regions? So I assume you are using HFile block size as default ie. 64KB. Can try reducing that
the cluster with just 3 nodes,. does hindex's 2nd filter select data(resultsets) from which be got after 1st filter? or every filter do full scan on index_table?
None of the filter do full scan on index table. So your indexed columns type is String only and you do equals condition. So for the index table scan we will create start and stop row. As this query covers 2 index, we will have 2 index scanners which retrieve data (at server side) simultaneously and using AND find the data rks. If there can be single index on both these columns that will be better any way. Just saying. Any idea you have, when there is no index usage, what time it will take to do the above query?
i am sorry to put forward such unprofessional problem ..>.<, what i want to say is that :
So your total rows count is 2 million. Can you tell me how many rows satisfy above said condition (col1=? AND col2=?) Also any idea you have that when you don't declare any column for index (normal full table scan) what time it takes(?)
within 10 rows satisfy above said condition,and it spends 40 seconds getting results.all cols have indexes..
'qx', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE true
', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', TTL => '2147483647', KEEP_DELETED_CELLS
=> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fal
se', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'
}
'qx_idx', {METHOD => 'table_att', MAX_FILESIZE => ' true
9223372036854775807', CONFIG => {'SPLIT_POLICY' =>
'org.apache.hadoop.hbase.regionserver.ConstantSizeR
egionSplitPolicy'}}, {NAME => 'd', DATA_BLOCK_ENCOD
ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_S
COPE => '0', COMPRESSION => 'NONE', VERSIONS => '3'
, TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DE
LETED_CELLS => 'false', BLOCKSIZE => '65536', ENCOD
E_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCAC
HE => 'true'}
Only 10 rows and taking 40 secs seems too much ! Total how many regions in these 3 nodes? I doubt whether index is getting used or not ...
Do u have below way Have so many rows satisfying col1 condition alone And so many rows satisfying col2 condition And both together many be max 10
yeah ,i think so , Maybe so many rows satisfying one of 2 condition ,but both together many be max 10. and it has 5 regions in 3 nodes. so i want to know if index works~
use filters:
HTablePool pool = new HTablePool(configuration, 1000);
List
Filter filter1 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("style_No"), CompareOp.EQUAL, Bytes
.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("country_No"), CompareOp.EQUAL, Bytes
.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);
//ResultScanner rs = table.getScanner(scan);// hindex Filter
ResultScanner rs = pool.getTable(tableName).getScanner(scan);
{....code...}
May i have your email and send u some pics?
@hy2014 @chrajeshbabu @anoopsjohn
anoop.hbase@gmail.com
i got the point,
the index name is "contry_No",and when i loaded data into Hbase,the col name is "country_No"...
BTW,i found a interesting thing, i new the filter:
new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("contry_No"), CompareOp.EQUAL, Bytes
.toBytes("8600"));
and the actual col is "info:country",but after full scan the table, it get the results correctly!! thank you , @anoopsjohn
So after correcting the name how long the query with usage of index takes? I hope it will much much lower than 40 sec.
within 1 sec. hindex is so fast!! i have got the ResultScanner rs in Dao.java and "return" it, and i want to show data in rs in other page , but in the Action.java ,rs is not null,but there is also no Result r in it,whether ResultScanner can't be returned? code as follows: Dao.java------- { ................ rs=pool.getTable(tableName).getScanner(scan); return rs; }
Action.java--------
{
QueryHbaseService qhsi=new QueryHbaseServiceImpl();
rs=qhsi.queryHbase(condMap); //call for Dao.java and return Rs
for (Result r : rs) { //codes in " for clause" is undo.
/* where is can't be reach */
for (KeyValue keyValue : r.raw()) { ... }
}
}
There should not problem using the ResultScanner in one class or a passed in class.. Not sure what is the problem you are facing. Can you check at the logs in client and RS side?
Hindex is prefect for indexing data in Hbase. And further tests will be done. thank u very much!
i have deployed Hadoop and hindex successfully, created table and inserted data , index table also existed, so ,how do i scan for special Qualifier which has index? like the code: get 'test','rowkey','Family:Qualifier','value' ?