airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.29k stars 3.95k forks source link

[destination-bigquery] Executing maximum query length of 1024k characters #36303

Open benorourke opened 5 months ago

benorourke commented 5 months ago

Connector Name

destination-bigquery

Connector Version

2.4.12

What step the error happened?

None

Relevant information

Key Details:

When attempting to load some tables, a POST request on the BQ jobs API is causing the connector to fail:

The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.

Relevant log output

2024-03-20 01:39:51 destination > ERROR main i.a.c.i.b.AirbyteExceptionHandler(uncaughtException):64 Something went wrong in the connector. See the logs for more details. java.util.concurrent.CompletionException: java.lang.RuntimeException: com.google.cloud.bigquery.BigQueryException: The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) ~[?:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.RuntimeException: com.google.cloud.bigquery.BigQueryException: The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.
        at io.airbyte.integrations.base.destination.typing_deduping.DefaultTyperDeduper.lambda$prepareTablesFuture$3(DefaultTyperDeduper.java:215) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
        ... 3 more
Caused by: com.google.cloud.bigquery.BigQueryException: The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:114) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.create(HttpBigQueryRpc.java:231) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl$5.call(BigQueryImpl.java:414) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl$5.call(BigQueryImpl.java:399) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) ~[gax-2.41.0.jar:2.41.0]
        at com.google.cloud.bigquery.BigQueryRetryHelper.run(BigQueryRetryHelper.java:86) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryRetryHelper.runWithRetries(BigQueryRetryHelper.java:49) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:398) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:363) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at io.airbyte.integrations.destination.bigquery.typing_deduping.BigQueryDestinationHandler.execute(BigQueryDestinationHandler.java:141) ~[io.airbyte.airbyte-integrations.connectors-destination-bigquery.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.TypeAndDedupeTransaction.executeTypeAndDedupe(TypeAndDedupeTransaction.java:46) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.TypeAndDedupeTransaction.executeSoftReset(TypeAndDedupeTransaction.java:71) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.DefaultTyperDeduper.lambda$prepareTablesFuture$3(DefaultTyperDeduper.java:191) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
        ... 3 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
POST https://bigquery.googleapis.com/bigquery/v2/projects/seedlegals/jobs?prettyPrint=false
{
  "code": 400,
  "errors": [
    {
      "domain": "global",
      "message": "The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.",
      "reason": "invalid"
    }
  ],
  "message": "The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.",
  "status": "INVALID_ARGUMENT"
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$3.interceptResponse(AbstractGoogleClientRequest.java:466) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111) ~[google-http-client-1.43.3.jar:1.43.3]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:552) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603) ~[google-api-client-2.2.0.jar:2.2.0]
        at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.create(HttpBigQueryRpc.java:229) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl$5.call(BigQueryImpl.java:414) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl$5.call(BigQueryImpl.java:399) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103) ~[gax-2.41.0.jar:2.41.0]
        at com.google.cloud.bigquery.BigQueryRetryHelper.run(BigQueryRetryHelper.java:86) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryRetryHelper.runWithRetries(BigQueryRetryHelper.java:49) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:398) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at com.google.cloud.bigquery.BigQueryImpl.create(BigQueryImpl.java:363) ~[google-cloud-bigquery-2.37.0.jar:2.37.0]
        at io.airbyte.integrations.destination.bigquery.typing_deduping.BigQueryDestinationHandler.execute(BigQueryDestinationHandler.java:141) ~[io.airbyte.airbyte-integrations.connectors-destination-bigquery.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.TypeAndDedupeTransaction.executeTypeAndDedupe(TypeAndDedupeTransaction.java:46) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.TypeAndDedupeTransaction.executeSoftReset(TypeAndDedupeTransaction.java:71) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at io.airbyte.integrations.base.destination.typing_deduping.DefaultTyperDeduper.lambda$prepareTablesFuture$3(DefaultTyperDeduper.java:191) ~[airbyte-cdk-typing-deduping-0.23.11.jar:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
        ... 3 more

Contribute

marcosmarxm commented 5 months ago

Hello @benorourke I added the issue to the destination backlog for grooming and further investigation. Do you mind sharing what source are you using or any more information could help us reproduce the issue?

benorourke commented 5 months ago

Hi @marcosmarxm, thanks for your reply.

I'm using the Hubspot source (4.0.0). In particular, the deals/deal_property_history table is the issue. I suspect it may be due to a large property value.

jbfbell commented 5 months ago

Hey @benorourke sorry you're experiencing this issue, we've seen this issue occur usually when there are a lot of columns. Values typically don't cause this since we're selecting from the raw table. But it's possible some combination of syncing into hundreds of columns, or if the column names are particularly long. You may be able to use Airbyte's column selection feature to only sync the columns you're interested in which would reduce the query size. You can also set up multiple connections and split the columns across two distinct tables in the bigquery but that is a little trickier.