br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Group by returns incorrect results #601

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
... or I'm doing something wrong. The following query

---
use dataverse azure;

for $t in dataset Tweets
group by $id := $t.user_id with $t
return {"id": $id, "count": count($t)}

---

produces the follow results (truncated)

---
{ "id": 8, "count": 1i64 }
{ "id": 51, "count": 1i64 }
{ "id": 52, "count": 1i64 }
...
{ "id": 52, "count": 1i64 }
{ "id": 51, "count": 1i64 }
{ "id": 52, "count": 1i64 }
...
{ "id": 8, "count": 1i64 }
...

---

Also, not sure who should own this.

Original issue reported on code.google.com by zheilb...@gmail.com on 5 Aug 2013 at 4:34

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
INFO: Optimized Plan:
distribute result [%0->$$6]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$6])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$6] <- [function-call: asterix:closed-record-constructor, Args:[AString: {id}, %0->$$1, AString: {count}, %0->$$11]]
      -- ASSIGN  |PARTITIONED|
        exchange 
        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
          group by ([$$1 := %0->$$13]) decor ([]) {
                    aggregate [$$11] <- [function-call: asterix:agg-sum, Args:[%0->$$12]]
                    -- AGGREGATE  |LOCAL|
                      nested tuple source
                      -- NESTED_TUPLE_SOURCE  |LOCAL|
                 }
          -- PRE_CLUSTERED_GROUP_BY[$$13]  |PARTITIONED|
            exchange 
            -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$13(ASC)] HASH:[$$13]  |PARTITIONED|
              group by ([$$13 := %0->$$10]) decor ([]) {
                        aggregate [$$12] <- [function-call: asterix:agg-count, Args:[AInt64: {1}]]
                        -- AGGREGATE  |LOCAL|
                          nested tuple source
                          -- NESTED_TUPLE_SOURCE  |LOCAL|
                     }
              -- PRE_CLUSTERED_GROUP_BY[$$10]  |PARTITIONED|
                exchange 
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  project ([$$10])
                  -- STREAM_PROJECT  |PARTITIONED|
                    exchange 
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      data-scan []<-[$$9, $$10, $$2] <- azure:Tweets
                      -- DATASOURCE_SCAN  |PARTITIONED|
                        exchange 
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          empty-tuple-source
                          -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Aug 04, 2013 9:57:26 PM edu.uci.ics.asterix.aql.translator.AqlTranslator 
handleQuery
INFO: {
 "connectors": [
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor[CDID:0]",
    "id": "CDID:0",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor"
   },
   "in-operator-id": "ODID:5",
   "in-operator-port": 0,
   "out-operator-id": "ODID:0",
   "out-operator-port": 0
  },
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor[CDID:1]",
    "id": "CDID:1",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor"
   },
   "in-operator-id": "ODID:0",
   "in-operator-port": 0,
   "out-operator-id": "ODID:6",
   "out-operator-port": 0
  },
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor[CDID:2]",
    "id": "CDID:2",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor"
   },
   "in-operator-id": "ODID:6",
   "in-operator-port": 0,
   "out-operator-id": "ODID:1",
   "out-operator-port": 0
  },
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.MToNPartitioningMergingConnectorDescriptor[CDID:3]",
    "id": "CDID:3",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.MToNPartitioningMergingConnectorDescriptor"
   },
   "in-operator-id": "ODID:1",
   "in-operator-port": 0,
   "out-operator-id": "ODID:2",
   "out-operator-port": 0
  },
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor[CDID:4]",
    "id": "CDID:4",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor"
   },
   "in-operator-id": "ODID:2",
   "in-operator-port": 0,
   "out-operator-id": "ODID:4",
   "out-operator-port": 0
  },
  {
   "connector": {
    "display-name": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor[CDID:5]",
    "id": "CDID:5",
    "java-class": "edu.uci.ics.hyracks.dataflow.std.connectors.OneToOneConnectorDescriptor"
   },
   "in-operator-id": "ODID:4",
   "in-operator-port": 0,
   "out-operator-id": "ODID:3",
   "out-operator-port": 0
  }
 ],
 "operators": [
  {
   "display-name": "edu.uci.ics.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor[ODID:0]",
   "id": "ODID:0",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.storage.am.btree.dataflow.BTreeSearchOperatorDescriptor",
   "out-arity": 1
  },
  {
   "display-name": "edu.uci.ics.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorDescriptor[ODID:1]",
   "id": "ODID:1",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorDescriptor",
   "out-arity": 1
  },
  {
   "display-name": "edu.uci.ics.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorDescriptor[ODID:2]",
   "id": "ODID:2",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorDescriptor",
   "out-arity": 1
  },
  {
   "display-name": "edu.uci.ics.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor[ODID:3]",
   "id": "ODID:3",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor",
   "out-arity": 0
  },
  {
   "display-name": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor[ODID:4]",
   "id": "ODID:4",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor",
   "micro-operators": [
    "assign [2] := [edu.uci.ics.hyracks.algebricks.core.algebra.expressions.LogicalExpressionJobGenToExpressionRuntimeProviderAdapter$ScalarEvaluatorFactoryAdapter@186540e1]",
    "stream-project [2]"
   ],
   "out-arity": 1
  },
  {
   "display-name": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor[ODID:5]",
   "id": "ODID:5",
   "in-arity": 0,
   "java-class": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor",
   "micro-operators": ["ets"],
   "out-arity": 1
  },
  {
   "display-name": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor[ODID:6]",
   "id": "ODID:6",
   "in-arity": 1,
   "java-class": "edu.uci.ics.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor",
   "micro-operators": ["stream-project [1]"],
   "out-arity": 1
  }
 ]
}

Original comment by zheilb...@gmail.com on 5 Aug 2013 at 4:57

GoogleCodeExporter commented 8 years ago
Why the sort operator is missing?
Sattam, does it relate to your recent change?

Original comment by buyingyi@gmail.com on 5 Aug 2013 at 5:00

GoogleCodeExporter commented 8 years ago
If that's the case, then my guess is that we don't have adequate tests for 
group by.

Original comment by zheilb...@gmail.com on 5 Aug 2013 at 5:02

GoogleCodeExporter commented 8 years ago
I don't see how this issue relate to the the change I did.
My change was to remove the sort when there is an insert operator. It is a 
local change inside the insert physical operator. That code path won't be 
visited by this query.

Original comment by salsuba...@gmail.com on 5 Aug 2013 at 5:04

GoogleCodeExporter commented 8 years ago
The tiny social suite runs successfully for me, producing a plan with a sort + 
preclustered gby.

Can someone familiar with the rewrite rules point me to the right place to 
look? The closest rule I could find was 
PushNestedOrderByUnderPreSortedGroupByRule. Is that it?

Original comment by zheilb...@gmail.com on 5 Aug 2013 at 5:29

GoogleCodeExporter commented 8 years ago
Looking further... EnforceStructuralPropertiesRule looks like more like the 
place where a sort would be introduced.

Original comment by zheilb...@gmail.com on 5 Aug 2013 at 5:33

GoogleCodeExporter commented 8 years ago
The temporary workaround produces the correct results (forcing an external gby):
---
use dataverse azure;

for $t in dataset Tweets
/* + hash*/
group by $id := $t.user_id with $t
return {"id": $id, "count": count($t)}
---

So... it looks like it's just preclustered, and not always... For instance, 
tiny social q9 will produce a sort before preclustered gby.

Original comment by zheilb...@gmail.com on 5 Aug 2013 at 6:31

GoogleCodeExporter commented 8 years ago

Original comment by buyingyi@gmail.com on 16 Aug 2013 at 7:29

GoogleCodeExporter commented 8 years ago
Can you verify whether this bug is fixed by your order by fix or not?

Original comment by zheilb...@gmail.com on 4 Oct 2013 at 10:23

GoogleCodeExporter commented 8 years ago
Zack, my fix in the running aggregation branch cannot fix this issue. I cc this 
to Yingyi as he mentioned about a generic fix for this (to support multiple key 
order constraint) during our discussion, to check whether that feature is ready 
or not.

Original comment by jarod...@gmail.com on 2 Nov 2013 at 3:57

GoogleCodeExporter commented 8 years ago
Yingyi, can I reassign this to you then?

Original comment by zheilb...@gmail.com on 6 Nov 2013 at 9:14

GoogleCodeExporter commented 8 years ago
Sure.

Original comment by buyingyi@gmail.com on 6 Nov 2013 at 9:18

GoogleCodeExporter commented 8 years ago
Thanks!

Original comment by zheilb...@gmail.com on 6 Nov 2013 at 9:38

GoogleCodeExporter commented 8 years ago
Yingyi, what is the status on this?

Original comment by zheilb...@gmail.com on 6 Dec 2013 at 8:51

GoogleCodeExporter commented 8 years ago

Original comment by zheilb...@gmail.com on 6 Dec 2013 at 10:31

GoogleCodeExporter commented 8 years ago
Fixed in yingyi/asterix_test

Original comment by buyingyi@gmail.com on 12 Oct 2014 at 7:43

GoogleCodeExporter commented 8 years ago

Original comment by buyingyi@gmail.com on 13 Oct 2014 at 7:55