apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.76k stars 4.21k forks source link

[Feature Request]: Update the default behaviour in `BigQueryIO.Write.Method.DEFAULT` to `STORAGE_API_AT_LEAST_ONCE` #31827

Open borjavb opened 1 month ago

borjavb commented 1 month ago

What would you like to happen?

The default behaviour of BigQueryIO.Write.Method for unbounded collections is to use STREAMING_INSERTS, which is now categorised as legacy .

Two new methods STORAGE_API_AT_LEAST_ONCE and STORAGE_WRITE_API are available, being STORAGE_API_AT_LEAST_ONCE the closest in the underlying semantics to STREAMING_INSERTS (best effort deduplication but no guarantees of only once). Using the storage API is also cheaper than the legacy streaming inserts by 50%, with the first 2TB free..

Should the default method point to STORAGE_API_AT_LEAST_ONCE instead of keep using STREAMING_INSERTS?

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

liferoad commented 1 month ago

We usually do not change the default. For this case, the recommended way is to use Managed IO once we onboard BigQuery IO to it.