airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.97k stars 4.1k forks source link

Source MongoDB: Promote to Beta #22664

Closed misteryeo closed 8 months ago

mickaelandrieu commented 1 year ago

Not really in beta to be honest... a lot of issues already.

The worst of them is the source connector is unable to find ALL properties of really big collections, making it unreliable for big databases.

IMHO, at least there is 1 missing feature: the configuration of the number of items to guess the properties (with a solution to say "look into ALL the elements please").

misteryeo commented 1 year ago

Hi @mickaelandrieu! Yes, it's still an alpha connector. This issue was created for us to consolidate all open issues that we'll need to resolve in order to graduate this connector to beta.

So while you're using the alpha connector and you notice any bugs or features you'd like to see fixed or supported, please file a separate Github issue so we can consider that as part of the process when we're ready to prioritize this connector!

mickaelandrieu commented 1 year ago

please file a separate Github issue so we can consider that as part of the process when we're ready to prioritize this connector!

I don't want to be mean or something, but these are well-known issues already (and a regression since 0.1.8)

misteryeo commented 1 year ago

Hi @mickaelandrieu, no worries at all. If you feel like the feedback and requests you'd like to share have already been captured already by the rest of the community, that's totally fine. Just wanted to make sure there wasn't anything that was missed as we go through and scope out how we can improve the connector. Thank you for your patience and support!

tybernstein commented 1 year ago

I had a user request the option to connect to the MongoDb Source Connector via an SSH tunnel. I saw that https://github.com/airbytehq/airbyte-internal-issues/issues/965 covers that. Could that be considered as one of the features as part of the promotion to Beta?

misteryeo commented 1 year ago

We'll definitely consider it @TBernstein4!

tybernstein commented 1 year ago

I had a user encounter issue 9780. Just surfacing it here in case it can be considered.

farisSOUM commented 1 year ago

Reiterating what @mickaelandrieu said, we are using this connector with a collection of over 1 million documents.

mickaelandrieu commented 1 year ago

For me, you should introduce back a parameter to setup at sync the number of documents to read in order to find every property : it's an expensive task, but this is something expected for old collections anyway

FarisSquared commented 1 year ago

Added a PR to change the schema discovery from $limit to $sample following the MongoDB's Compass client

https://github.com/airbytehq/airbyte/pull/29511

mickaelandrieu commented 1 year ago

It's less than the current limit (10k => 1k), please reconsider your contribution :(

FarisSquared commented 1 year ago

It's less than the current limit (10k => 1k), please reconsider your contribution :(

Hey there, no we still use the same 10k limit, but we should change the method in the aggregation query method from $limit to $sample

mickaelandrieu commented 1 year ago

good idea !