apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.8k stars 4.22k forks source link

[Bug]: Issue with AvroGenericRecordToStorageApiProto.java handling nullable arrays #31674

Closed codertimu closed 3 months ago

codertimu commented 3 months ago

What happened?

I've encountered an issue with the AvroGenericRecordToStorageApiProto.java class in Apache Beam, which is responsible for converting an Avro generic record into a proto object for writing to BigQuery using the Storage API.

The class contains two primary methods for handling the conversion:

  1. private TableFieldSchema fieldDescriptorFromAvroField(Schema.Field field)
  2. private static Object toProtoValue(FieldDescriptor fieldDescriptor, Schema avroSchema, Object value)

These methods handle array types effectively when the array in the Avro schema is defined as follows:

{
  "name": "simple_array",
  "type": "array",
  "items": "string",
  "default": []
}

However, the application crashes when a nullable array is used, such as:

{
  "name": "simple_array",
  "type": [
    "null",
    {
      "type": "array",
      "items": "string"
    }
  ],
  "default": null
}

This issue seems to stem from the handling of nullable arrays in the conversion methods.

Issue Priority

Priority: 3 (minor)

Issue Components

codertimu commented 3 months ago

.take-issue