godatadriven / pydantic-avro

This library can convert a pydantic class to a avro schema or generate python code from a avro schema.
https://github.com/godatadriven/pydantic-avro
MIT License
63 stars 30 forks source link

Datetime conversion #98

Closed anneum closed 10 months ago

anneum commented 10 months ago

I have a suggestion to convert datetime to "logicalType": "date" instead of timestamp-micros. When I use the following sample class:

from pydantic_avro.base import AvroBase

class Example(AvroBase):
    id: str
    created_at: AwareDatetime

Example.model_json_schema() gives the following output:

{
    "properties": {
        "id": {
            "title": "Id",
            "type": "string"
        },
        "created_at": {
            "format": "date-time",
            "title": "Created At",
            "type": "string"
        }
    },
    "required": [
        "id",
        "created_at"
    ],
    "title": "Example",
    "type": "object"
}

And Example.avro_schema():

{
    "type": "record",
    "namespace": "Example",
    "name": "Example",
    "fields": [
        {
            "type": "string",
            "name": "id"
        },
        {
            "type": {
                "type": "long",
                "logicalType": "timestamp-micros"
            },
            "name": "created_at"
        }
    ]
}

According to the json-schema specification and RFC 3339, date-time should look like "2022-04-05T09:33:23.000Z" and is therefore more of a "logicalType": "date" rather than a "logicalType": "timestamp-micros" as written in base.py line 109

anneum commented 10 months ago

Sorry, that was my mistake. The problem is that I tried to use a datetime object without a timezone and I got an error parsing

def serialize_avro_data(data: Any, base_model: AvroBase) -> bytes:
    schema_parsed = parse(json.dumps(base_model.avro_schema()))
    bytes_writer = io.BytesIO()
    encoder = avro.io.BinaryEncoder(bytes_writer)
    datum_writer = avro.io.DatumWriter(schema_parsed)
    datum_writer.write(data, encoder)
    return bytes_writer.getvalue()

Works:

example = Example(id='test', created_at=datetime.now(tz=timezone.utc))
serialize_avro_data(example.model_dump(), Example)

Doesnt work:

example = Example(id='test', created_at=datetime.now())
serialize_avro_data(example.model_dump(), Example)