Closed chitara-01 closed 9 months ago
Attention: 9 lines
in your changes are missing coverage. Please review.
Comparison is base (
d02173f
) 13.43% compared to head (3c7ce08
) 13.41%.
Files | Patch % | Lines |
---|---|---|
...m/tokenization/parquet/GenericRecordFlattener.java | 0.00% | 9 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Summary (Short summary of what is being done) :
Updated logic to process array data types in parquet
Description (Describe in detail the fix made) :
Current logic to process ARRAY data types in parquet is breaking the pipeline. The field name is assigned a null value when the data type is an array. In turn, the pipeline throws "CoderException: cannot encode a null String". For more details, please refer to the attached buganizer ticket.
According the improved implementation, any array data type in parquet structure will be converted to a list of strings and will further be processed by DLP API and written to BQ tables as string with "[" and "]" as start and end characters, respectively, to denote it was originally an ARRAY data type in Parquet. This is to make the implementation logic simpler and can be improved on when the output is required in Parquet format.
Bug ID (if any) :
b/310247478
Public Documentation (if any) :
TESTED (Test Cases with scenario and description - must have 1 positive and 1 negative scenario) :
Tested on the parquet file provided in the ticket.