Left Outer Join does not return results in specified order

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?

Create asterix instance using managix one CC & NC. From Web UI run the 
following statements.

drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type Ord as open{
id:int32,
shipToName:string
}

create type Cust as open{
id:int32,
name:string
}

create dataset customers(Cust) primary key id;
create dataset orders(Ord) primary key id;

insert into dataset customers({"id":197,"name":"Carey"});
insert into dataset customers({"id":617,"name":"Susan"});
insert into dataset customers({"id":161,"name":"Yaqub"});
insert into dataset customers({"id":117,"name":"Smith"});

insert into dataset orders({"id":917,"shipToName":"John"});
insert into dataset orders({"id":617,"shipToName":"Susan"});
insert into dataset orders({"id":127,"shipToName":"Kevin"});
insert into dataset orders({"id":117,"shipToName":"Robert"});

for $l in dataset customers
return {
              "cust":$l,
              "orders": for $m in dataset orders                              
                               where $l.id=$m.id   
                               order by $l.id desc
                               return $m
              }

Results returned are : 

{ "cust": { "id": 117, "name": "Smith" }, "orders": [ { "id": 117, 
"shipToName": "Robert" } ] }
{ "cust": { "id": 161, "name": "Yaqub" }, "orders": [  ] }
{ "cust": { "id": 197, "name": "Carey" }, "orders": [  ] }
{ "cust": { "id": 617, "name": "Susan" }, "orders": [ { "id": 617, 
"shipToName": "Susan" } ] }

What is the expected output? What do you see instead?

Results should be returned in descending order based on id field of orders 
dataset. Right now system returns results in ascending order, even when we 
specify the order as descending.

Please use labels and text to provide additional information.

Asterix version:0.0.6-SNAPSHOT (as of 11th May 2013)

Optimized Query Plan:

distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$3] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$9]]
      -- ASSIGN  |PARTITIONED|
        project ([$$0, $$9])
        -- STREAM_PROJECT  |PARTITIONED|
          exchange 
          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
            group by ([$$10 := %0->$$13]) decor ([%0->$$0]) {
                      aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$1]]
                      -- AGGREGATE  |LOCAL|
                        select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$12]])
                        -- STREAM_SELECT  |LOCAL|
                          nested tuple source
                          -- NESTED_TUPLE_SOURCE  |LOCAL|
                   }
-- PRE_CLUSTERED_GROUP_BY[$$13]  |PARTITIONED|
              exchange 
              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                order (ASC, %0->$$13) (DESC, %0->$$13) 
                -- STABLE_SORT [$$13(ASC), $$13(DESC)]  |PARTITIONED|
                  exchange 
                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                    project ([$$0, $$1, $$12, $$13])
                    -- STREAM_PROJECT  |PARTITIONED|
                      exchange 
                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                        left outer join (function-call: algebricks:eq, Args:[%0->$$13, %0->$$11])
                        -- HYBRID_HASH_JOIN [$$13][$$11]  |PARTITIONED|
                          exchange 
                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                            data-scan []<-[$$13, $$0] <- test:customers
                            -- DATASOURCE_SCAN  |PARTITIONED|
                              exchange 
                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                empty-tuple-source
                                -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
exchange 
                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                            assign [$$12] <- [TRUE]
                            -- ASSIGN  |PARTITIONED|
                              exchange 
                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                data-scan []<-[$$11, $$1] <- test:orders
                                -- DATASOURCE_SCAN  |PARTITIONED|
                                  exchange 
                                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                    empty-tuple-source
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Original issue reported on code.google.com by khfaraaz82 on 11 May 2013 at 7:10

GoogleCodeExporter commented 8 years ago

That's because you don't have an outer order-by clause.
The inner order by clause won't guarantee the final result will be ordered.

Original comment by salsuba...@gmail.com on 11 May 2013 at 7:14

GoogleCodeExporter commented 8 years ago

This query looks correct.
Because the order-by only specifies the order inside the bag (1 bag per $l 
binding).
In the results, the right branch has only one or zero items in the bag, so it 
is still ordered.
The plan also looks correct.

However, there might be some bug. Can you try that one?
for $l in dataset customers
return {
              "cust":$l,
              "orders": for $m in dataset orders                              
                               where $l.name=$m.shipToName  
                               order by $m.shipToName desc
                               return $m
              }

See what's the plan looks like.

Original comment by buyingyi@gmail.com on 11 May 2013 at 7:29

GoogleCodeExporter commented 8 years ago

Thanks Yingyi, I tried your query;

drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type Ord as open{
id:int32,
shipToName:string
}

create type Cust as open{
id:int32,
name:string
}

create dataset customers(Cust) primary key id;
create dataset orders(Ord) primary key id;

insert into dataset customers({"id":197,"name":"Carey"});
insert into dataset customers({"id":617,"name":"Susan"});
insert into dataset customers({"id":161,"name":"Yaqub"});
insert into dataset customers({"id":117,"name":"Robert"});

insert into dataset orders({"id":917,"shipToName":"John"});
insert into dataset orders({"id":617,"shipToName":"Susan"});
insert into dataset orders({"id":127,"shipToName":"Kevin"});
insert into dataset orders({"id":117,"shipToName":"Robert"});

for $l in dataset customers
return {
              "cust":$l,
              "orders": for $m in dataset orders                              
                               where $l.name=$m.shipToName  
                               order by $m.shipToName desc
                               return $m
              }

Results returned :

{ "cust": { "id": 117, "name": "Robert" }, "orders": [ { "id": 117, 
"shipToName": "Robert" } ] }
{ "cust": { "id": 161, "name": "Yaqub" }, "orders": [  ] }
{ "cust": { "id": 197, "name": "Carey" }, "orders": [  ] }
{ "cust": { "id": 617, "name": "Susan" }, "orders": [ { "id": 617, 
"shipToName": "Susan" } ] }

Optimized query plan :

distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$3] <- [function-call: asterix:open-record-constructor, Args:[AString: {cust}, %0->$$0, AString: {orders}, %0->$$9]]
      -- ASSIGN  |PARTITIONED|
        project ([$$0, $$9])
        -- STREAM_PROJECT  |PARTITIONED|
          exchange 
          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
            group by ([$$10 := %0->$$13]) decor ([%0->$$0]) {
                      aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$1]]
                      -- AGGREGATE  |LOCAL|
                        select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$12]])
                        -- STREAM_SELECT  |LOCAL|
                          nested tuple source
                          -- NESTED_TUPLE_SOURCE  |LOCAL|
                   }
            -- PRE_CLUSTERED_GROUP_BY[$$13]  |PARTITIONED|
              exchange 
              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                order (ASC, %0->$$13) (DESC, %0->$$15) 
                -- STABLE_SORT [$$13(ASC), $$15(DESC)]  |PARTITIONED|
                  exchange 
                  -- HASH_PARTITION_EXCHANGE [$$13]  |PARTITIONED|
                    project ([$$0, $$1, $$12, $$13, $$15])
                    -- STREAM_PROJECT  |PARTITIONED|
                      exchange 
                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                        left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$15])
                        -- HYBRID_HASH_JOIN [$$14][$$15]  |PARTITIONED|
                          exchange 
                          -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
                            assign [$$14] <- [function-call: asterix:field-access-by-index, Args:[%0->$$0, AInt32: {1}]]
                            -- ASSIGN  |PARTITIONED|
                              exchange 
                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                data-scan []<-[$$13, $$0] <- test:customers
                                -- DATASOURCE_SCAN  |PARTITIONED|
                                  exchange 
                                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                    empty-tuple-source
                                    -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
                          exchange 
                          -- HASH_PARTITION_EXCHANGE [$$15]  |PARTITIONED|
                            assign [$$12, $$15] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {1}]]
                            -- ASSIGN  |PARTITIONED|
                              project ([$$1])
                              -- STREAM_PROJECT  |PARTITIONED|
                                exchange 
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  data-scan []<-[$$11, $$1] <- test:orders
                                  -- DATASOURCE_SCAN  |PARTITIONED|
                                    exchange 
                                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                      empty-tuple-source
                                      -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Original comment by khfaraaz82 on 11 May 2013 at 7:38

GoogleCodeExporter commented 8 years ago

All right,  it is correct :-)

Original comment by buyingyi@gmail.com on 11 May 2013 at 7:48

GoogleCodeExporter commented 8 years ago

Original comment by buyingyi@gmail.com on 11 May 2013 at 6:47

Changed state: Invalid

GoogleCodeExporter commented 8 years ago

Side note:  It would have been useful to try more than one item in a collection 
being ordered.

Original comment by dtab...@gmail.com on 12 May 2013 at 2:36

br1ghtyang / asterixdb

Left Outer Join does not return results in specified order #440