apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.89k stars 4.27k forks source link

When BigqueryIO's STORAGE_API_AT_LEAST_ONCE method is used and the table spec doesn't contain the project id the pipeline fails #21405

Open damccorm opened 2 years ago

damccorm commented 2 years ago

The validation method correctly uses BigqueryIO's

getTableWithDefaultProject(BigQueryOptions bqOptions), but at the run time the table spec is not checked for absence of the project id, resulting in RuntimeException.

 

 

Stack trace:

Caused by: java.lang.NullPointerException: Required parameter projectId must be specified.     at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897)     at com.google.api.client.util.Preconditions.checkNotNull(Preconditions.java:138)     at com.google.api.services.bigquery.Bigquery$Tables$Get.<init>(Bigquery.java:5325)     at com.google.api.services.bigquery.Bigquery$Tables.get(Bigquery.java:5298)     at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:553)     at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:542)     at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:536)     at org.apache.beam.sdk.io.gcp.bigquery.StorageApiDynamicDestinationsTableRow$1.lambda$$0(StorageApiDynamicDestinationsTableRow.java:66)     at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)     at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)     at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)     at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)     at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)     ... 35 more

Imported from Jira BEAM-13612. Original Jira may contain additional context. Reported by: slilichenko.

Geddy05 commented 1 year ago

.take-issue

vishwajeetfr commented 1 year ago

.take-issue

ahmedabu98 commented 1 year ago

@slilichenko are you still running into this issue?

I've outlined what I tried below. Let me know if I'm misunderstanding the issue:

I was able to reproduce this by not including the project in pipeline options nor in the table spec, then setting .withoutValidation() on the write configuration. However, this is working as intended. Validation is meant to check these things at pipeline construction time and throw an error before it runs. Without validation, you will run into a RuntimeException. FYI this behavior is not unique to STORAGE_API_AT_LEAST_ONCE, other write methods will also fail with Runetime/IO Exceptions when they try loading data to BQ.

slilichenko commented 1 year ago

I haven't tried it recently. The bug is about picking up the default (or the one provided via BigQueryOptions) project id rather than failing with NPE - https://github.com/apache/beam/blob/4e67a59f051afca68653048a217e2f874d31833a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L145