GoogleCloudPlatform / dlp-dataflow-deidentification

Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP
Apache License 2.0
89 stars 53 forks source link

BQ Storage Write API Support #153

Closed Goutam1511 closed 1 year ago

Goutam1511 commented 1 year ago

Summary (Short summary of what is being done) :

This change adds the BQ Storage Write API Support from [DataflowTemplates Repository](https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/834/files#diff-66ca2bbaddc8c2e3ae8abfb6a3ffbf5dbcbfbd6e2fb7ee4da102062159550cf7R688)

Description (Describe in detail the fix made) :

Adding a deterministic coder to encode and decode TableRow class to String to enable Storage Write API

Bug ID (if any) :


Public Documentation (if any) :


TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :

- Enabled Storage Write API with exactly-once semantics and at-least once semantics - Enabled Storage Write API with triggeringFrequency and numberOfStreams > 1 - Checked Streaming Write is working even with changes - Checked if pipeline throws exception with Storage Write API number of streams > 1 but no triggeringFrequency provided TO TEST : Existing issue of row duplication