br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Index nested loop join hint not being picked for left outer join #446

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
PS - This could be a very soon after beta feature request

What steps will reproduce the problem?

Start asterix one CC and NC using managix, and run these statements from Web UI.

Shouldn't this query be treated as a left outer join that performs indexed 
nested loop join, since I give a hint to optimizer to use the secondary B-Tree 
index defined on the name field of orders dataset ?

Instead I see the system handles this as a hybrid hash join.

//Query to get customer-id's of Alaskan customers

drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type TestType  as open {
id:int32,
name:string,
state:string
}

create dataset customers(TestType) primary key id;
create dataset orders(TestType) primary key id;

create index idx01 on orders(name);

insert into dataset customers({"id":65,"name":"John Doe","state":"Alaska"});
insert into dataset customers({"id":31,"name":"John","state":"TX"});
insert into dataset customers({"id":10,"name":"Smith","state":"CA"});
insert into dataset customers({"id":75,"name":"Roger","state":"MA"});

insert into dataset orders({"id":34,"name":"Henry","state":"OR"});
insert into dataset orders({"id":35,"name":"Kyle","state":"AZ"});
insert into dataset orders({"id":45,"name":"Kevin","state":"WA"});
insert into dataset orders({"id":65,"name":"John Doe","state":"Alaska"});

for $l in dataset customers
return
{          "cust-id":$l.id,
           "state":for $m in dataset orders
            where $l.name /* +indexnl */ = $m.name
            return $m.state
}

Here is the optimized query plan.

distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$3] <- [function-call: asterix:closed-record-constructor, Args:[AString: {cust-id}, %0->$$17, AString: {state}, %0->$$10]]
      -- ASSIGN  |PARTITIONED|
        project ([$$17, $$10])
        -- STREAM_PROJECT  |PARTITIONED|
          exchange 
          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
            group by ([$$11 := %0->$$14]) decor ([$$17 := %0->$$14]) {
                      aggregate [$$10] <- [function-call: asterix:listify, Args:[%0->$$9]]
                      -- AGGREGATE  |LOCAL|
                        select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$13]])
                        -- STREAM_SELECT  |LOCAL|
                          nested tuple source
                          -- NESTED_TUPLE_SOURCE  |LOCAL|
                   }
            -- PRE_CLUSTERED_GROUP_BY[$$14]  |PARTITIONED|
              exchange 
              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                order (ASC, %0->$$14) 
                -- STABLE_SORT [$$14(ASC)]  |PARTITIONED|
                  exchange 
                  -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
                    project ([$$9, $$13, $$14])
                    -- STREAM_PROJECT  |PARTITIONED|
                      exchange 
                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                        left outer join (function-call: algebricks:eq, Args:[%0->$$15, %0->$$16])
                        -- HYBRID_HASH_JOIN [$$15][$$16]  |PARTITIONED|
                          exchange 
                          -- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
                            project ([$$14, $$15])
                            -- STREAM_PROJECT  |PARTITIONED|
                              assign [$$15] <- [function-call: asterix:field-access-by-index, Args:[%0->$$0, AInt32: {1}]]
                              -- ASSIGN  |PARTITIONED|
                                exchange 
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  data-scan []<-[$$14, $$0] <- test:customers
                                  -- DATASOURCE_SCAN  |PARTITIONED|
                                    exchange 
                                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                      empty-tuple-source
                                      -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
                          exchange 
                          -- HASH_PARTITION_EXCHANGE [$$16]  |PARTITIONED|
                            project ([$$16, $$9, $$13])
                            -- STREAM_PROJECT  |PARTITIONED|
                              assign [$$13, $$9, $$16] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {2}], function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]]
                              -- ASSIGN  |PARTITIONED|
                                project ([$$1])
                                -- STREAM_PROJECT  |PARTITIONED|
                                  exchange 
                                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                    data-scan []<-[$$12, $$1] <- test:orders
                                    -- DATASOURCE_SCAN  |PARTITIONED|
                                      exchange 
                                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                        empty-tuple-source
                                        -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Original issue reported on code.google.com by khfaraaz82 on 15 May 2013 at 6:31

GoogleCodeExporter commented 8 years ago

Original comment by khfaraaz82 on 17 May 2013 at 9:35