airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.59k stars 4.01k forks source link

🐛 Source MongoDB v2: Failed to fetch schema MongoDB Atlas #8564

Open JeroniMan opened 2 years ago

JeroniMan commented 2 years ago

Environment

Current Behavior

I created source (MongoDB) and destination (BigQuery), checked connection - on this step all ok. After, i setup new connection, and started fetch data schema from mongo, this operation lasted about 30 min, and failed.

image

Expected Behavior

Fetch data schema

Logs

error.txt

LOG ``` 2021-12-06 22:58:32 ERROR () LineGobbler(voidCall):82 - Exception in thread "main" com.mongodb.MongoCommandException: Command failed with error 16872 (Location16872): 'Invalid $project :: caused by :: '$' by itself is not a valid FieldPath' on server *******-shard-00-02.9lnlp.mongodb.net:27017. The full response is {"operationTime": {"$timestamp": {"t": 1638831510, "i": 4}}, "ok": 0.0, "errmsg": "Invalid $project :: caused by :: '$' by itself is not a valid FieldPath", "code": 16872, "codeName": "Location16872", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1638831510, "i": 4}}, "signature": {"hash": {"$binary": {"base64": "4swprwaSLZ01X1OdvfHQwUnrdy8=", "subType": "00"}}, "keyId": 6998221739326963718}}} ```

Steps to Reproduce

  1. Create source and destination
  2. Check connection to source and destination
  3. Setup new connection

Are you willing to submit a PR?

JeroniMan commented 2 years ago

I find similar case, but i don't know how it solve) May be it help.

https://jira.mongodb.org/browse/SERVER-57854

alafanechere commented 2 years ago

Hi @JeroniMan, according to the Jira link you sent it looks like a bug on the latest Mongo query engine. I'm afraid that our current discover step relies on high level java client for Mongo and that the workaround suggested in Jira is too low level for our connector. This error seems to come from queries on document with empty keys : "". I'd suggest you to wait for the release of the fix mentionned in the Jira ticket or try to delete the empty keys in your mongo collection if possible.

JeroniMan commented 2 years ago

@alafanechere thank you. I try again create connection, and now have other problem:

airbyte-webapp | 2021/12/08 11:14:14 [error] 40#40: *6 upstream timed out (110: Operation timed out) while reading response header from upstream, client: **** server: localhost, request: "POST /api/v1/sources/discover_schema HTTP/1.1", upstream: "http://****:8001/api/v1/sources/discover_schema", host: "****", referrer: "http://****/source/new-connection" image

How i can change time-out, i think that this issue occurs due to the large number of collections in the source.

alafanechere commented 2 years ago

Now have other problem: i think that this issue occurs due to the large number of collections in the source.

Hi @JeroniMan, we improved the schema discovery performance for our mongodb connector with this PR https://github.com/airbytehq/airbyte/pull/8491 could you please upgrade to your source mongodb v2 connector to 0.1.9

About you original problem are you aware of any action you or your team made to solve this?

JeroniMan commented 2 years ago

@alafanechere thank you. We have not resolved the original problem. We thought about removing broken documents, but we are running on mongodb 4.4.10 and not 5. Should we be affected by that ?

marcosmarxm commented 2 years ago

Zendesk ticket #1645 has been linked to this issue.

marcosmarxm commented 2 years ago

Comment made from Zendesk by Sajarin on 2022-07-19 at 12:35:

Hi @fauh45, thanks for your post. It seems like this might be related to this open issue here: https://github.com/airbytehq/airbyte/issues/8564
marcosmarxm commented 2 years ago

Zendesk ticket #1574 has been linked to this issue.

marcosmarxm commented 2 years ago

Comment made from Zendesk by Nataly Merezhuk on 2022-07-19 at 15:39:

Sure! Here is the issue, I am also linking your post to it internally.
https://github.com/airbytehq/airbyte/issues/8564
marcosmarxm commented 2 years ago

Comment made from Zendesk by Marcos Marx on 2022-07-20 at 01:58:

Hmm the issue you linked seems to have a different error message on the logs to mine though (?). Also I’ve used the newest MongoDB source version there. And the MongoDB database I connect to is not even reaching 1k of document yet.

[Discourse post]
JCWahoo commented 1 year ago

error persists, when using latest version of airbyte i get a new message as opposed to no error message

Error: non-json response

shubham19may commented 1 year ago

Hey @alafanechere we are also facing the same issue while detecting streams/catalogue, where key value is empty for some fields. { "" : 123}

Can we have a fix that will turn empty value to lets say an underscore _ ?

FarisSquared commented 1 year ago

@shubham19may There is an issue for this https://github.com/airbytehq/airbyte/issues/19206