DataBrewery / cubes-mongo

[MAINTAINER WANTED] Cubes mongo backend
MIT License
7 stars 5 forks source link

mongodb backend uses wrong pipeline order, sums aggregate field name instead of measure (physical) field name #2

Open Stiivi opened 8 years ago

Stiivi commented 8 years ago

Moved from Databrewery/cubes#336 by @miraculixx


Based on the tutorial I converted the tutorial dataset to MongoDB, using the cube with the mongo backend. After fixing the mongo backend it almost works, except that the MongoBrowser creates a pipeline that returns wrong results:

from cubes import Workspace
workspace = Workspace(config="/home/patrick/projects/cubestest/slicer-mongo.ini")
result = browser.aggregate(drilldown=["year"])
for record in result:
    print record
=>
{u'record_count': 31, u'amount_sum': 0, u'year': 2009}
{u'record_count': 31, u'amount_sum': 0, u'year': 2010}

As you can see the amount_sum fields are 0, which is wrong. Looking at the pipeline generated reveals why:

[{"$match": {"year": {"$exists": 1}}}, 
 {"$project":        <<< wrong order, project pipeline goes after group
        {"record_count": "$record_count", 
        "amount_sum": "$amount_sum", "year": 1}
 }, 
 {"$group": {"record_count": {"$sum": 1}, 
         "_id": {"year": {"$year": "$year"}}, 
         "amount_sum": {"$sum": "$amount_sum"}}},    <<< should be "$sum:" : "$amount"
  {"$sort": {"_id.year": 1}}
]

when the two problems are fixed manually and the aggregation directly run against mongodb the results are correct:

db.ibrd.aggregate([{ "$match": { "year": { "$exists": 1 } } }, 
{ "$group": { "record_count": { "$sum": 1 }, "_id": { "year": { "$year": "$year" } }, "amount": { "$sum": "$amount"} } }, 
{ "$project": { "record_count": "$record_count", "amount": "$amount", "year": 1 } }, 
{ "$sort": { "_id.year": 1 } }])
=>
/* 1 */
{
    "_id" : {
        "year" : 2009
    },
    "record_count" : 31,
    "amount" : 550840
},
/* 2 */
{
    "_id" : {
        "year" : 2010
    },
    "record_count" : 31,
    "amount" : 566020
}

The problem probably occurs within the mapping code. Any hints?

Data ready to import into mongodb: https://gist.github.com/miraculixx/5ffc89241478e8b09883 MongoDB: 3.2 cubes: release-1.0.1 with fix for mongodb applied as per above

#slices.ini
[store]
type: mongo
url: mongodb://localhost:27017/
database: cubes
collection: ibrd
[models]
main: tutorial_model.json
robin900 commented 8 years ago

I believe the fix should happen at https://github.com/DataBrewery/cubes-mongo/blob/master/cubes-mongo/browser.py#L406

Instead of

escape_level(agg.ref())

you want

escape_level(agg.measure.ref())

I can't make these changes myself and test; I have too much other work to do at the moment.

Stiivi commented 8 years ago

Thanks @robin900. I added the change. Requires testing. CC: @miraculixx

voldesh commented 8 years ago

"unicode type has no attribute ref" I got this error when updated the statement to agg.measure.ref()