But I was confused on the purpose of this line 492
.append("AND _PARTITIONTIME IS NOT NULL")
When you're doing "DELETE FROM" and ignoring the null partition, doesn't this mean that there will be a batch number that will never be merged?
I was thinking what would happen in this scenario below:
...
at 3:00 AM: Merge flush for batchNumber 3190
at 3:01 AM: Delete from intermediate table where batchNumber <= 3190 and _partitiontime is not null
at 4:00 AM: Merge flush for batchNumber 3191
...
From this alone, the second step won't delete records that satisfies both "batchNumber = 3190" and "_partitiontime is null".
Doesn't this basically mean those record that are left in batchNumber 3190 won't ever be merged?
Hi,
I was reading how the sink connector do merge flush event and reads that after a batch of data in the intermediate table has been merged, a
DELETE FROM
statement is executed as shown here https://github.com/confluentinc/kafka-connect-bigquery/blob/f176198503e37e743d04b7bf687d7d08571ea851/kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/MergeQueries.javaBut I was confused on the purpose of this line 492
When you're doing "DELETE FROM" and ignoring the null partition, doesn't this mean that there will be a batch number that will never be merged?
I was thinking what would happen in this scenario below:
... at 3:00 AM: Merge flush for batchNumber 3190
at 3:01 AM: Delete from intermediate table where batchNumber <= 3190 and _partitiontime is not null
at 4:00 AM: Merge flush for batchNumber 3191 ...
From this alone, the second step won't delete records that satisfies both "batchNumber = 3190" and "_partitiontime is null".
Doesn't this basically mean those record that are left in batchNumber 3190 won't ever be merged?