apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.77k stars 4.21k forks source link

[Bug]: Java SDK BigQueryIO's RowMutationInformation class is not backward compatible with previous releases #31993

Open slilichenko opened 1 month ago

slilichenko commented 1 month ago

What happened?

BigQueryIO's CDC ingestion requires usage of RowMutationInformation class. This class was two pairs of methods to return the change sequence number. The recently deprecated pair, "public static RowMutationInformation of(MutationType mutationType, long sequenceNumber)" and "public abstract Long getSequenceNumber();" are no longer work correctly - sequence number provided in the first method is no longer returned in the second due to this code. This breaks existing pipelines which haven't converted to the newly introduced methods.

Additionally, the new method uses compute intensive checking for the proper formatting of the sequence number. Is it possible that the underlying Storage Write API does the same validation and there is no need to do it twice?

Also, using "checkArgument" function in the pipeline's runtime code can cause a streaming pipeline with a single row with incorrect RowMutationInformation to fail, unless the developer explicitly catches IllegalStateException. it will have to be cancelled and could not be drained.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

ahmedabu98 commented 2 weeks ago

CC @damondouglas