Closed antonioc-ps closed 1 month ago
Thanks for pointing out @antonioc-ps. We have observed several issues with the current unbounded source implementation and will re-write it completely in the future. As a precautionary measure, the connector's unbounded source feature will be withdrawn soon since the current offering is simply incorrect.
Closing for now.
BigQuery has a concept of _PARTITIONTIME for Partitioned Table that describes a pseudo column that contains the UTC day that the partitioned data was loaded.
When the method
BigQuerySource.streamAvros
is used, internally it runs the following:Unfortunately the
tableSchema
that is returned doesn't contain this pseudocolumn _PARTITIONTIME.The
streamAvro
method follows the below chain of methods callsUnboundedSplitAssigner.discoverNewSplits
->BigQueryServicesImpl.retrievePartitionsStatus
->BigQueryServicesImpl.retrievePartitionColumnInfo
->BigQueryPartitionUtils.retrievePartitionColumnType
this last method throw an exception because
_PARTITIONTIME
column doesn't appear in theTable tableInfo = BigQueryUtils.tableInfo(bigquery, project, dataset, table).getSchema()