Open damccorm opened 2 years ago
.take-issue
.take-issue
@slilichenko are you still running into this issue?
I've outlined what I tried below. Let me know if I'm misunderstanding the issue:
I was able to reproduce this by not including the project in pipeline options nor in the table spec, then setting .withoutValidation()
on the write configuration. However, this is working as intended. Validation is meant to check these things at pipeline construction time and throw an error before it runs. Without validation, you will run into a RuntimeException. FYI this behavior is not unique to STORAGE_API_AT_LEAST_ONCE
, other write methods will also fail with Runetime/IO Exceptions when they try loading data to BQ.
I haven't tried it recently. The bug is about picking up the default (or the one provided via BigQueryOptions) project id rather than failing with NPE - https://github.com/apache/beam/blob/4e67a59f051afca68653048a217e2f874d31833a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L145
The validation method correctly uses BigqueryIO's
getTableWithDefaultProject(BigQueryOptions bqOptions), but at the run time the table spec is not checked for absence of the project id, resulting in RuntimeException.
Stack trace:
Caused by: java.lang.NullPointerException: Required parameter projectId must be specified. at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897) at com.google.api.client.util.Preconditions.checkNotNull(Preconditions.java:138) at com.google.api.services.bigquery.Bigquery$Tables$Get.<init>(Bigquery.java:5325) at com.google.api.services.bigquery.Bigquery$Tables.get(Bigquery.java:5298) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:553) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:542) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:536) at org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinationsTableRow$1.lambda$$0(StorageApiDynamicDestinationsTableRow.java:66) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ... 35 more
Imported from Jira BEAM-13612. Original Jira may contain additional context. Reported by: slilichenko.