GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

BigQuery: fix an issue with option propagation and refactor to future-proof #540

Closed dhalperi closed 7 years ago

dhalperi commented 7 years ago

We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes.

Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration.

Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility.

This fixes GoogleCloudPlatform/DataflowJavaSDK#539.

peihe commented 7 years ago

Do the refactoring first in beam? https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java#L59

dhalperi commented 7 years ago

@peihe given the amount of code divergence, they both need careful review. Please review here

peihe commented 7 years ago

But, I think we should still do the Beam first, otherwise it will diverge further more.

And, I think forward ports PRs could cause additional inconvenience during review and backport.

peihe commented 7 years ago

commented on https://github.com/apache/beam/pull/1873

Let's get Beam PR LGTMed, and then update this accordingly.

peihe commented 7 years ago

Update this PR based on https://github.com/apache/beam/pull/1873?

dhalperi commented 7 years ago

The code is substantially different here, plus we are unable to make backwards-incompatible changes. Needs separate review.

peihe commented 7 years ago

LGTM

dhalperi commented 7 years ago

Thanks!