coiled / dask-mongo

BSD 3-Clause "New" or "Revised" License
19 stars 9 forks source link

Add project parameter #7

Open crusaderky opened 3 years ago

crusaderky commented 3 years ago

Very frequently, a user will want to drop some of the keys in the mongodb documents, server side, before they are loaded.

Please add a project: dict[str, Any] = None parameter to read_mongo. The project must be applied after the two matches. Please include a test that would fail if it were the other way around.

joej commented 1 year ago

?

ShaneHarvey commented 1 year ago

To add some more context, @crusaderky is asking for the ability to configure the $project aggregation stage to limit the amount of data that dask needs to load: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/

One workaround for now would be to create a view (https://www.mongodb.com/docs/manual/core/views/) that applies the projection and then call read_mongo on the view:

db.create_collection("collProjected", viewOn="coll", pipeline=[{"$project": {"a": 1, "b": 1}])
res = read_mongo("db", "collProjected", ...)