airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.87k stars 4.07k forks source link

Destination bigquery: "Query is too large" error with many columns #31994

Open roman-yermilov-gl opened 11 months ago

roman-yermilov-gl commented 11 months ago

Slack duscussion: https://airbyte-globallogic.slack.com/archives/C02U9R3AF37/p1698416865620589 Link to failed sync: https://cloud.airbyte.com/workspaces/b79f07e0-0575-4ed5-93bb-6ef9c6396646/connections/cddd2da6-b66c-4213-9919-d85338df36fb/job-history#5544729::4

Error log:

2023-10-30 20:18:27 destination > 2023-10-30 20:18:19 ERROR i.a.i.b.d.t.DefaultTyperDeduper(lambda$prepareTablesFuture$1):170 - Exception occurred while preparing tables for stream deals
2023-10-30 20:18:27 destination > com.google.cloud.bigquery.BigQueryException: The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.
...
...
2023-10-30 20:18:36 destination > Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
2023-10-30 20:18:36 destination > POST https://www.googleapis.com/bigquery/v2/projects/healthy-clock-304411/jobs?prettyPrint=false
2023-10-30 20:18:36 destination > {
2023-10-30 20:18:36 destination >   "code": 400,
2023-10-30 20:18:36 destination >   "errors": [
2023-10-30 20:18:36 destination >     {
2023-10-30 20:18:36 destination >       "domain": "global",
2023-10-30 20:18:36 destination >       "message": "The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.",
2023-10-30 20:18:36 destination >       "reason": "invalid"
2023-10-30 20:18:36 destination >     }
2023-10-30 20:18:36 destination >   ],
2023-10-30 20:18:36 destination >   "message": "The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.",
2023-10-30 20:18:36 destination >   "status": "INVALID_ARGUMENT"
2023-10-30 20:18:36 destination > }

also seen in https://github.com/airbytehq/oncall/issues/3664

sc-yan commented 11 months ago

exactly the same error here. also with syncing hubspot. I understand it's a restriction from BigQuery, but still hope it could be fixed or there is a workaround

jbfbell commented 6 months ago

If you're still experiencing the issue, you may have to use column selection to reduce the number of columns you're syncing, or disable typing and deduping if you don't want the typed tables

aljets commented 4 months ago

Column selection appears to not be working with the hubspot source--it's still trying to load the entire object into _airbyte_data.