Closed dsalychev closed 10 years ago
Are you expecting DEGENERATE because you're comparing an INTEGER to a BINARY(16)? We allow any type to be coerced to BINARY and we'll autofill with a null byte if it's shorter than the fixed length. If the value being compared was a fixed width type that's bigger than a BINARY(16), it would be degenerate.
Sorry, but I will try to explain my point clearly.
Is "FULL SCAN" an expecting scan while I am searching over pk-based fields, like "c" and "t"? Where is a "pk" index of "myTable" table?
DEGENERATE SCAN means that a query can't possibly return any rows. If we can determine that at compile time, then we don't bother to even run the scan.
FULL SCAN means that all rows of the table will be scanned over (potentially with a filter applied if you have a WHERE clause)
RANGE SCAN means that only a subset of the rows in your table will be scanned over. This occurs if you use one or more leading columns from your primary key constraint. Note that you can add a secondary index on your "c" and "t" columns and that would cause a range scan to be done (over the index table).
SKIP SCAN means that either a subset or all rows in your table will be scanned over, however it will skip large groups of rows depending on the conditions in your filter. See this blog for more detail. We don't do a SKIP SCAN if you have no filter on the leading primary key columns, but you can force a SKIP SCAN by using the /+ SKIP_SCAN / hint. Under some conditions, namely when the cardinality of your leading primary key columns is low, it will be more efficient than a FULL SCAN.
Can you explain what is a "leading" columns? For example, what columns will be leading in CONSTRAINT pk PRIMARY KEY (d DESC, o DESC, l, s, r, c, t, m, k))?
Is it "d"? Is it "d" and "o"?
Is it possible to manually select leading column from pk constraint?
Yes, that's correct - it's the columns in the primary key constraint in the order they were declared starting with the first one, being used in a where clause filter.
James
I am not sure about a right place for question, but how complex the "incremental index maintenance"?
Is it O(n)? Is it O(log(n))?
For example, if we have sorted list and try to add value, we do not need to iterate over the whole list (avoid complexity of O(n)). We can use binary search to achieve O(log(n)). Is it Phoenix's way?
Just think of the index as another HBase table with a different row key. Your performance penalty is that two writes (one for the data table and one for the index table) will be performed where as only one write (the one to the data table) would have been performed before. This is the cost of maintaining the index.
If you're ok with this cost (as perhaps with your use case you're reading more than writing), then it's a good tradeoff. At read/query time, we'll use the index table instead of the data table. What would normally be a full table scan over all your data will turn into a range scan or point lookup instead.
This is my DDL:
This is my query:
And this is my result:
This is my expectation of a result:
When I try to select something, I've got a "FULL SCAN" instead of "DEGENERATE SCAN". Is it my mistake? Or a bug?