confluentinc / avro

Mirror of Apache Avro
Apache License 2.0
12 stars 20 forks source link

Serializing a GenericRecord does not write the top level Record's doc comment into the .avro file's schema #19

Open LordSyntax opened 4 years ago

LordSyntax commented 4 years ago

With Confluent.Apache.Avro v1.7.7.7

Given the following schema:

{
    "type": "record",
    "namespace": "com.example.foo",
    "name": "bar",
    "doc": "An example record doc field, which will be ignored by the DatumFileWriter.",
    "fields": [
        ...
    ]
}

If you create a GenericRecord using the above schema and populate the fields, then use the following code to persist it to an .avro file; the doc comment from the top level record is lost.

using var datumFileWriter = DataFileWriter<GenericRecord>.OpenWriter(datumWriter, writeStream);
                datumFileWriter.Append(record);
                datumFileWriter.Flush();

e.g. Result (Note, schema was extracted from within .avro file's contents and fields trimmed for brevity):

{"type":"record","name":"bar","namespace":"com.example.foo","fields":[...]}

According to the Avro spec (https://avro.apache.org/docs/current/spec.html#schema_record), record type should support doc comments; incidentally the python library we use for processing avro files works as expected and includes doc comments for records.