Closed marcosmarxm closed 2 years ago
as part of acceptance criteria we should add a test case for this as well
This exception comes from the destination-bigquery:
2021-10-04 19:20:23 INFO () DefaultAirbyteStreamFactory(lambda$create$0):53 - 2021-10-04 19:20:23 [32mINFO[m i.a.i.d.b.BigQueryUtils(createTable):92 - {} - Table: GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=111testing_raw, tableId=_airbyte_tmp_qny_campaigns}} created successfully
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - Exception in thread "main" com.google.cloud.bigquery.BigQueryException: 404 Not Found
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - POST https://www.googleapis.com/upload/bigquery/v2/projects/dataline-integration-testing/jobs?uploadType=resumable
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "error": {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "code": 404,
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "message": "Not found: Dataset dataline-integration-testing:111testing_raw",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "errors": [
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "message": "Not found: Dataset dataline-integration-testing:111testing_raw",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "domain": "global",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "reason": "notFound"
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - ],
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "status": "NOT_FOUND"
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 -
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.open(HttpBigQueryRpc.java:655)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.TableDataWriteChannel$2.call(TableDataWriteChannel.java:87)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.TableDataWriteChannel$2.call(TableDataWriteChannel.java:82)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.TableDataWriteChannel.open(TableDataWriteChannel.java:81)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.TableDataWriteChannel.<init>(TableDataWriteChannel.java:41)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.BigQueryImpl.writer(BigQueryImpl.java:1388)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at io.airbyte.integrations.destination.bigquery.BigQueryDestination.getConsumer(BigQueryDestination.java:214)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:127)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at io.airbyte.integrations.destination.bigquery.BigQueryDestination.main(BigQueryDestination.java:337)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - POST https://www.googleapis.com/upload/bigquery/v2/projects/dataline-integration-testing/jobs?uploadType=resumable
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "error": {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "code": 404,
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "message": "Not found: Dataset dataline-integration-testing:111testing_raw",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "errors": [
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - {
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "message": "Not found: Dataset dataline-integration-testing:111testing_raw",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "domain": "global",
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "reason": "notFound"
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - ],
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - "status": "NOT_FOUND"
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - }
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 -
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1116)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.open(HttpBigQueryRpc.java:652)
2021-10-04 19:20:24 ERROR () LineGobbler(voidCall):65 - ... 11 more
2021-10-04 19:21:47 ERROR () DefaultReplicationWorker(run):128 - Sync worker failed.
Do we know why this is being thrown? and what do/did we do about it?
The official docs from Google BigQuery also states:
Unquoted identifiers must begin with a letter or an underscore character. Subsequent characters can be letters, numbers, or underscores
These are invalid identifiers:
5Customers
https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical
That's why it was implemented to insert an extra underscore in normalization in these cases and the final tables would be created (or queried) in dataline-integration-testing
._111testing_raw
Of course, this would assume that the destination-bigquery
would also follow the same logic and output the raw table there in the first place too...
Here it seems to apply the logic to table names but maybe not to datasets?
Remove this with your answer.