forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
558 stars 227 forks source link

Why isn't my query doing a RANGE SCAN? #614

Closed dsalychev closed 10 years ago

dsalychev commented 10 years ago

This is my DDL:

CREATE TABLE IF NOT EXISTS myTable (
    d DATE NOT NULL, 
    o BIGINT NOT NULL, 
    l UNSIGNED_TINYINT NOT NULL, 
    s UNSIGNED_TINYINT NOT NULL, 
    r UNSIGNED_TINYINT NOT NULL, 
    c BINARY(16) NOT NULL, 
    t BINARY(16) NOT NULL, 
    m BINARY(16) NOT NULL, 
    k BINARY(16) NOT NULL, 
    e VARCHAR NOT NULL, 
    i VARCHAR NULL, 
    q BINARY(16) NULL 
    CONSTRAINT pk PRIMARY KEY (d DESC, o DESC, l, s, r, c, t, m, k)) IMMUTABLE_ROWS=true

This is my query:

explain select *  from myTable where c=1 and t=3;

And this is my result:

+------------+
|    PLAN    |
+------------+
| CLIENT PARALLEL 1-WAY FULL SCAN OVER myTable  |
|     SERVER FILTER BY (C = [B@4ad9da94 AND T = [B@249d50f0) |
+------------+

This is my expectation of a result:

+------------+
|    PLAN    |
+------------+
| DEGENERATE SCAN OVER <myTable_pk_index> |
+------------+

When I try to select something, I've got a "FULL SCAN" instead of "DEGENERATE SCAN". Is it my mistake? Or a bug?

jtaylor-sfdc commented 10 years ago

Are you expecting DEGENERATE because you're comparing an INTEGER to a BINARY(16)? We allow any type to be coerced to BINARY and we'll autofill with a null byte if it's shorter than the fixed length. If the value being compared was a fixed width type that's bigger than a BINARY(16), it would be degenerate.

dsalychev commented 10 years ago

Sorry, but I will try to explain my point clearly.

Is "FULL SCAN" an expecting scan while I am searching over pk-based fields, like "c" and "t"? Where is a "pk" index of "myTable" table?

jtaylor-sfdc commented 10 years ago

DEGENERATE SCAN means that a query can't possibly return any rows. If we can determine that at compile time, then we don't bother to even run the scan.

FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause)

RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Note that you can add a secondary index on your "c" and "t" columns and that would cause a range scan to be done (over the index table).

SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don't do a SKIP SCAN if you have no filter on the leading primary key columns, but you can force a SKIP SCAN by using the /+ SKIP_SCAN / hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.

dsalychev commented 10 years ago

Can you explain what is a "leading" columns? For example, what columns will be leading in CONSTRAINT pk PRIMARY KEY (d DESC, o DESC, l, s, r, c, t, m, k))?

Is it "d"? Is it "d" and "o"?

dsalychev commented 10 years ago

Is it possible to manually select leading column from pk constraint?

jtaylor-sfdc commented 10 years ago

Yes, that's correct - it's the columns in the primary key constraint in the order they were declared starting with the first one, being used in a where clause filter.

James

dsalychev commented 10 years ago

I am not sure about a right place for question, but how complex the "incremental index maintenance"?

Is it O(n)? Is it O(log(n))?

For example, if we have sorted list and try to add value, we do not need to iterate over the whole list (avoid complexity of O(n)). We can use binary search to achieve O(log(n)). Is it Phoenix's way?

jtaylor-sfdc commented 10 years ago

Just think of the index as another HBase table with a different row key. Your performance penalty is that two writes (one for the data table and one for the index table) will be performed where as only one write (the one to the data table) would have been performed before. This is the cost of maintaining the index.

If you're ok with this cost (as perhaps with your use case you're reading more than writing), then it's a good tradeoff. At read/query time, we'll use the index table instead of the data table. What would normally be a full table scan over all your data will turn into a range scan or point lookup instead.