apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.93k stars 984 forks source link

"supportsAggregatePushdown": does not work correctly on sharded Mongo Cluster #2831

Open tfold opened 10 months ago

tfold commented 10 months ago

Drill 1.2.1 Mongo 4.4 Mongodb Config: 2 shards/3 replicas Mongos on port 27017

DrillBit:

{ "type": "DB", "connection": "mongodb://mongo01:27017,mongo02:27017/?readPreference=secondaryPreferred", "pluginOptimizations": { "supportsProjectPushdown": true, "supportsFilterPushdown": true, "supportsAggregatePushdown": true, "supportsSortPushdown": true, "supportsUnionPushdown": true, "supportsLimitPushdown": true }, "batchSize": 100, "enabled": true, "authMode": "SHARED_USER"

Running a count(*) on the above drill only returns the # of records on one shard. Changing "supportsAggregatePushdown": to false reports the correct count from both shards.

Same query from mongo shell reports total across both shards.

Since we are querying mongos from drill, the count should include records from both shards, but for some reason it only reports a count on the shard that it is talking to.