Open bvolpato opened 1 year ago
This issue might be in CreateByteToBigQueryDLQ . I can't find the code for that function anywhere.
AFAIK the bug is that Beam did not infer a Coder for the toTableRow PCollection so the user needs to specify a Coder here with setCoder. However this should have caused an error on job submission - the fact that it wasn't might be a bug in the portable job-submission codepath.
What happened?
BigQueryIO has coder issues when using a very specific codepath utilizing
withFormatFunction()
+getFailedInsertsWithErr()
, which are just exposed when using Dataflow Runner v2.In short, I have a
PCollection<byte[]>
that gets written to BigQueryIO (through Streaming Inserts) and uses.withFormatFunction()
to convert it into a TableRow. Somehow, withwithExtendedErrorInfo
and consuminggetFailedInsertsWithErr
to pipe to anotherBigQueryIO
, the underlying coder is stillByteArrayCoder
(the original input's coder, not TableRow) andFnApiDoFnRunner
can not consume the affected TableRows correctly.Code to reproduce: https://gist.github.com/bvolpato/a5f3f1f44071eafc1034935bc4fffbae
Cause:
Full stack trace:
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components