dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
50 stars 40 forks source link

mongodb: support end_date, row_order, limit and arrow backend #438

Open rudolfix opened 2 months ago

rudolfix commented 2 months ago

Source name

mongo

Describe the data you'd like to see

I'd like a mongodb features on par with sql_database features:

  1. backfill with incremental (start / end ranges) - we need to support end_value
  2. explicit ordering of results (support row_order
  3. allow to set a limit on a source/resource level to load just N first results. will help with microbatching
  4. pymongo got arrow support: https://github.com/mongodb-labs/mongo-arrow and we should use it

Are you a dlt user?

Yes, I'm already a dlt user.

Do you ready to contribute this extension?

Yes, I'm ready.

dlt destination

any

Additional information

ad limit: limit works well with ordered results. if results are not ordered (lack of incremental field and declared order) - issue a warning!

Please split ticket into two PRs

  1. requirements 1-3
  2. pyarrow backend - a separate one