ch-robinson / dotnet-avro

An Avro implementation for .NET
https://engineering.chrobinson.com/dotnet-avro/
MIT License
134 stars 49 forks source link

Default values support in codegen #203

Open nicodeslandes opened 2 years ago

nicodeslandes commented 2 years ago

Support for default values were introduced in version 8.0.2 for schema generation, meaning that a class such as this:

public class MyRecord
{
    [DefaultValue(null)]
    public int? Value { get; set; }
}

Produces this schema, with the expected "default": null property:

{
  "type": "record",
  "name": "MyRecord",
  "fields": [
    {
      "name": "Value",
      "type": [
        "null",
        "int"
      ],
      "default": null
    }
  ]
}

What would be great is to also have the opposite. Given schema above, have the generated code include a [DefaultValue] attribute to match the schema. At the moment, here's what the generated class looks like for the schema above when running dotnet avro generate:

public class MyRecord
{
    public int? Value { get; set; }
}

This is a problem, because any producer who uses that class will now produce a new schema which no longer includes the default value, and potentially start creating compatibility problems in the schema registry.

Could the dotnet avro generate command be updated to support default values from the schemas?

dstelljes commented 2 years ago

This would be nice to have, but I’m not sure what we’d do when the default value couldn’t be expressed as a constant. For instance, with a (contrived) schema:

{
  "type": "record",
  "name": "MyRecord",
  "fields": [
    {
      "name": "Value",
      "type": [
        "MyRecord",
        "null"
      ],
      "default": {
        "MyRecord": {
          "Value": null
        }
      }
    }
  ]
}

the code generator would try to express that as:

[DefaultValue(new MyRecord { Value = null })]
public class MyRecord
{
    public MyRecord? Value { get; set; }
}

which would not compile.

nicodeslandes commented 2 years ago

Sorry for the late reply, I've been sidetracked away from Avro these last few days. I've got 2 possible suggestions:

  1. Handle the "happy" path only, if the default value is null of a scalar, generate the [DefaultValue] attribute correctly. If we're not in a supported scenario, issue a warning and don't produce anything.
  2. For the unsupported case like the one you highlighted above, still issue a warning, but decorate the property with a new attribute [DefaultSchemaValueJson], which reproduce the default value as it is defined in the originating schema, and is used when producing the schema from the record class.

Option 1 would be good enough for most cases, and would be enough to ensure we have a good story when one team generate a schema from C# code, and another team wants to produce a C# class from that same schema. The 2nd team would be able to used that class without new, potentially incompatible schemas being published to the registry.

Option 2 would even push this further by allowing a producer to handle schemas that don't come from a generated C# class.

dstelljes commented 2 years ago

I’d want to give some more thought to Option 2 since that would presumably require generated code to depend on Chr.Avro.Json, but Option 1 definitely seems like a good first step. Codegen is still one big switch statement, but if we refactored a bit to match the schema/serde builder case pattern, that’d give us a mechanism to report warnings.

nicodeslandes commented 2 years ago

Yes I agree. Option 1 seems to me quite valuable even if we take a bit more time fleshing out Option 2.

I don't know whether Option 2 requires any additional dependencies though. Couldn't the default value json be treated as a simple string by the generated code? (and basically ignored until the need to publish a schema arises)

dstelljes commented 2 years ago

Yeah, the schema would just be the JSON string, but DefaultSchemaValueJsonAttribute would have to live somewhere.