dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
72 stars 50 forks source link

add filter to mongodb source #521

Closed korbash closed 2 months ago

korbash commented 4 months ago

Source name

mongodb

Describe the data you'd like to see

Add a filter parameter to the MongoDB source that will be applied before reading data from MongoDB. This feature is useful for creating incremental sources based on fields that could be null or may not exist (e.g., callback_date).

Are you a dlt user?

Yes, I'm already a dlt user.

Do you ready to contribute this extension?

Yes, I'm ready.

dlt destination

No response

Additional information

No response

rudolfix commented 4 months ago

@IlyaFaer @korbash I see two ways to implement this:

  1. we let user to pass a callback function that will receive mongo expression (with all filtering and sorting ops that we generate for incremental loading) and let user add/change it
  2. we just let users pass a filter expression that we add internally. this is less powerful but way simpler WDYT?
korbash commented 4 months ago

I prefer the second approach. It's not less powerful in filtering, we just need to handle cases with the incremental.cursor_path field carefully. Sorting should ideally come from incremental.last_value_func, so the user doesn't need to specify it.