Aggregate queries in MongoDB don't respect the order of the projected columns

leorochael commented 6 years ago

Issue Summary

When writing an aggregate query against a MongoDB datasource, the order of the columns cannot be chosen and is effectively random, even with a $project step in the aggregate pipeline specifying the order exactly.

Changes in the $project keyword ordering causes stable but unpredictable changes in the ordering of the columns. It seems that the $project keyword value is being round-tripped through a Python dictionary and losing it's original ordering.

Steps to Reproduce

Create an aggregate query against a MongoDB collection
Add a $project step renaming some columns
See that the order of the columns returned by redash doesn't match the order of the columns in the $project keyword.

Technical details:

Redash Version: 4.0.0+b3948
Browser/OS: Firefox/Linux
How did you install Redash: Docker

leorochael commented 6 years ago

This issue has been reported in the forum, but has not been answered.

arikfr commented 6 years ago

I'm not sure if it's applicable in a aggregate query, but if you specify fields value in your query, we use the order there for the columns order.

leorochael commented 6 years ago

@arikfr,

I can confirm that when specifying a fields dictionary/object in an aggregate query, the order of the output columns match the sorting order of the values assigned to the column names in the fields dictionary.

So this is a functional workaround, even though it requires mentioning the field names twice.

Would be nice if it was fixed though. If the order being lost is caused by a round-trip of the json of the query itself into a Python dict, then this StackOverflow question discusses some solutions:

Upgrade to Python 3.6 or later, where the dicts are ordered by default.
Pass object_pairs_hook=collections.OrderedDict as argument to json.loads() or json.JSONDecoder() (compatible all the way back to Python 2.7).

getredash / redash