getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
26.36k stars 4.37k forks source link

Aggregate queries in MongoDB don't respect the order of the projected columns #2598

Open leorochael opened 6 years ago

leorochael commented 6 years ago

Issue Summary

When writing an aggregate query against a MongoDB datasource, the order of the columns cannot be chosen and is effectively random, even with a $project step in the aggregate pipeline specifying the order exactly.

Changes in the $project keyword ordering causes stable but unpredictable changes in the ordering of the columns. It seems that the $project keyword value is being round-tripped through a Python dictionary and losing it's original ordering.

Steps to Reproduce

  1. Create an aggregate query against a MongoDB collection
  2. Add a $project step renaming some columns
  3. See that the order of the columns returned by redash doesn't match the order of the columns in the $project keyword.

Technical details:

leorochael commented 6 years ago

This issue has been reported in the forum, but has not been answered.

arikfr commented 6 years ago

I'm not sure if it's applicable in a aggregate query, but if you specify fields value in your query, we use the order there for the columns order.

leorochael commented 6 years ago

@arikfr,

I can confirm that when specifying a fields dictionary/object in an aggregate query, the order of the output columns match the sorting order of the values assigned to the column names in the fields dictionary.

So this is a functional workaround, even though it requires mentioning the field names twice.

Would be nice if it was fixed though. If the order being lost is caused by a round-trip of the json of the query itself into a Python dict, then this StackOverflow question discusses some solutions:

  1. Upgrade to Python 3.6 or later, where the dicts are ordered by default.

  2. Pass object_pairs_hook=collections.OrderedDict as argument to json.loads() or json.JSONDecoder() (compatible all the way back to Python 2.7).