Open crusaderky opened 3 years ago
?
To add some more context, @crusaderky is asking for the ability to configure the $project aggregation stage to limit the amount of data that dask needs to load: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/
One workaround for now would be to create a view (https://www.mongodb.com/docs/manual/core/views/) that applies the projection and then call read_mongo on the view:
db.create_collection("collProjected", viewOn="coll", pipeline=[{"$project": {"a": 1, "b": 1}])
res = read_mongo("db", "collProjected", ...)
Very frequently, a user will want to drop some of the keys in the mongodb documents, server side, before they are loaded.
Please add a
project: dict[str, Any] = None
parameter toread_mongo
. The project must be applied after the two matches. Please include a test that would fail if it were the other way around.