confluentinc / avro

Mirror of Apache Avro
Apache License 2.0
12 stars 20 forks source link

AVRO-2359: Support Logical Types in C# #14

Closed timjroberts closed 3 years ago

timjroberts commented 5 years ago

This PR brings in C# support for logical types.

The code was originally based on the apache/avro repository (https://github.com/apache/avro/pull/492), but I've also applied the patch here for your consideration.

mhowlett commented 5 years ago

thanks - i should be able to look at this soon after we get 1.0 of the kafka client out the door (I have been focussed on that). I notice that there are more than 32000 lines of code though, most of which shouldn't be here, so at the very least i'll need to tidy it up and open again (which would loose attribution to you). is that ok, or do you want to tidy it up?

Also, I noticed we don't have the contributor license agreement bot set up on this repo, I think you can sign it by going here: http://clabot.confluent.io/cla (that just allows us to confidently do what we want with the code, probably ultimately merging upstream to the apache foundation).

timjroberts commented 5 years ago

It looks like most of those 32K lines are project.assets.json files. Perhaps they're not in the .gitignore file. I'll take a look at tidying that up.

timjroberts commented 5 years ago

I've also just complete the CLA.

timjroberts commented 5 years ago

Around ~1K lines now. That seems more like it. :-)

mhowlett commented 5 years ago

great, thanks! i hope I can get this in 1.0.1.

timjroberts commented 5 years ago

There were a few missing switch branches for testing the processing of a logicalType. These have now been added.

timjroberts commented 5 years ago

@mhowlett - have you had any time to look at the interop scenarios that you mentioned above? Is there anything you need me to look at? I'm planning on looking at @petersilverwood's Decimal implementation this week - it's the logical type implementation that I'm most naive with.

timjroberts commented 5 years ago

I've now reworked the Decimal logical type so that it takes into account the big-endian requirement for encoding, and also to support negative values.

cc/ @petersilverwood

timjroberts commented 5 years ago

I've added a new AvroDecimal type which encapsulates a big decimal. It's a wrapper around .NET's BigInteger and a scale. The 'decimal' logical type has then been refactored to use instances of this type instead of managing byte buffers directly. The logic was based on the 'decimal' logical type in the Java codebase, so they should work the same. However, saying that, I'm not sure about the two's-compliment side of things which the spec says the encoded decimal should be in.

/cc @petersilverwood

timjroberts commented 5 years ago

So after a little reading, I think that because .NET byte is unsigned, we're already getting the two's compliment encoding of the decimal bytes. So I think we're fine. A Java byte is signed so I think it has to be converted before.

dcrdev commented 5 years ago

Quick observation - this doesn't look to support schemas of mixed primitive and logical types, is that intended?

timjroberts commented 5 years ago

@dcrdev - not quite sure what you mean, but you can define Records that use a mix of types (primitive, complex, and now logical). For example, the following schema uses a mix of primitive and the decimal logical type:

{
    "type": "record",
    "namespace": "WordCountEntities",
    "name": "WordCount",
    "fields": [
        {
            "name": "TotalWordCount",
            "type": "int"
        },
        {
            "name": "WordCounts",
            "type": { "type": "map", "values": "int" }
        },
        {
            "name": "TopWord",
            "type": "string"
        },
        {
            "name": "ExecutionCost",
            "type": {
                "type": "bytes",
                "logicalType": "decimal",
                "precision": 8,
                "scale": 2
            }
        }
    ]
}
timjroberts commented 5 years ago

Any planned movement on this inclusion?

mhowlett commented 5 years ago

there's plenty of demand for this (it came up again yesterday with a customer), but it's not at the top of the list.

note: there's movement on getting rid of our fork, so this work will be done against the apache repo: https://github.com/confluentinc/confluent-kafka-dotnet/issues/1033

shawnallen85 commented 5 years ago

Any update on supporting this?

mhowlett commented 5 years ago

@shawnallen85 - there's movement on this over here https://github.com/apache/avro/pull/492

maxfield-riley commented 4 years ago

Do we have any movement on this tool? Currently the avrogen tool generates "type": { "logicalType": "date", "type": "int" } as "type": "int" which leads to schema registry issues of Schema mismatch. Reader: [schema] writer: [schema]

thufailmattankil commented 4 years ago

Confluent.Apache.Avro.AvroGen 1.7.7.5 generates schema file (.avsc) data type based on "type" instead of "logicaltype". Below example generates datatype as Nullable int instead of Nullable DateTime . Could you please help. "type": ["null", { "type": "int", "logicalType": "date" } ]

alexander-karnadi commented 4 years ago

hi, is there any plan to include this in a formal release?

andrewegel commented 3 years ago

This is causing our Jenkins CI to break due to the "unknown repository" - This needs to be closed to get around that. If this change is still required, please re-do it in a fork or branch.