Support for inspection and de-identification of ORC data stored in GCS buckets
Description (Describe in detail the fix made) :
The implementation is to read ORC files from GCS storage buckets to process the data using inspection and de-identification pipelines. The results are written in BugQuery tables. This is part one of the DLP ORC support project. Future work includes writing results as ORC files in GCS buckets and re-identify the data stored in de-identified ORC files.
Bug ID (if any) :
301563096
Public Documentation (if any) :
TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :
Wrote the script to convert CSV data from CCRecords sample files to ORC format. The implementation worked successfully for inspection and de-identification.
Summary (Short summary of what is being done) :
Support for inspection and de-identification of ORC data stored in GCS buckets
Description (Describe in detail the fix made) :
The implementation is to read ORC files from GCS storage buckets to process the data using inspection and de-identification pipelines. The results are written in BugQuery tables. This is part one of the DLP ORC support project. Future work includes writing results as ORC files in GCS buckets and re-identify the data stored in de-identified ORC files.
Bug ID (if any) :
301563096
Public Documentation (if any) :
TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :