Open borjavb opened 1 month ago
The default behaviour of BigQueryIO.Write.Method for unbounded collections is to use STREAMING_INSERTS, which is now categorised as legacy .
STREAMING_INSERTS
Two new methods STORAGE_API_AT_LEAST_ONCE and STORAGE_WRITE_API are available, being STORAGE_API_AT_LEAST_ONCE the closest in the underlying semantics to STREAMING_INSERTS (best effort deduplication but no guarantees of only once). Using the storage API is also cheaper than the legacy streaming inserts by 50%, with the first 2TB free..
Should the default method point to STORAGE_API_AT_LEAST_ONCE instead of keep using STREAMING_INSERTS?
STORAGE_API_AT_LEAST_ONCE
Priority: 3 (nice-to-have improvement)
We usually do not change the default. For this case, the recommended way is to use Managed IO once we onboard BigQuery IO to it.
What would you like to happen?
The default behaviour of BigQueryIO.Write.Method for unbounded collections is to use
STREAMING_INSERTS
, which is now categorised as legacy .Two new methods STORAGE_API_AT_LEAST_ONCE and STORAGE_WRITE_API are available, being STORAGE_API_AT_LEAST_ONCE the closest in the underlying semantics to STREAMING_INSERTS (best effort deduplication but no guarantees of only once). Using the storage API is also cheaper than the legacy streaming inserts by 50%, with the first 2TB free..
Should the default method point to
STORAGE_API_AT_LEAST_ONCE
instead of keep usingSTREAMING_INSERTS
?Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components