Added support to INSPECT & DEID parquet files from GCS bucket and store results in BQ datasets.
Description (Describe in detail the fix made) :
Introducing a dedicated java package to read the data from parquet files as GenericRecord objects, flatten each record and convert to Table.Row objects for further processing. This change works for inspection and de-identification of input files stored in GCS storage buckets. The results are written in BigQuery datasets. The tables from the BigQuery datasets can be further re-identified in the usual manner.
Bug ID (if any) :
b/293426633
Public Documentation (if any) :
TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :
Converted CCRecords sample data from CSV format to Parquet format and tested both inspection and de-identification pipelines.
Summary (Short summary of what is being done) :
Added support to INSPECT & DEID parquet files from GCS bucket and store results in BQ datasets.
Description (Describe in detail the fix made) :
Introducing a dedicated java package to read the data from parquet files as GenericRecord objects, flatten each record and convert to Table.Row objects for further processing. This change works for inspection and de-identification of input files stored in GCS storage buckets. The results are written in BigQuery datasets. The tables from the BigQuery datasets can be further re-identified in the usual manner.
Bug ID (if any) :
b/293426633
Public Documentation (if any) :
TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :