Open rchoffardet opened 6 months ago
See https://github.com/ch-robinson/dotnet-avro/discussions/173 for a related discussion—we probably don’t want to introduce logical types that aren’t defined by the spec, but we’d be open to a custom property that hints the .NET type to use.
Doesn't the spec allow to specify custom logical types?
Also, as this library is designed for dotnet, I find that it would make perfect sense to implement base types as logical types. Would it not? If needed it could be System.UInt32
instead of uint
.
The spec does allow for custom logical types (and indeed, “Language implementations must ignore unknown logical types when reading”), but we don’t know how the spec might evolve. If we introduce a type like uint
, we risk conflicting semantics if the spec ever chose to introduce a logical type with the same name.
Would it, though? If the specification introduces an uint
logical type, would be different in java than in dotnet ? C/C++ ? I think we can safely presume that uint
is universal.
How should negative values be handled? 31 bits or 32? Does int
even make sense as an underlying type since Avro integers are zig-zag encoded? It’s impossible to predict how the spec would answer those types of questions. If we stick to something like an avro.dotnet.type
property, we know we’ll never conflict with the spec.
The spec says that:
A logical type is always serialized using its underlying Avro type so that values are encoded in exactly the same way as the equivalent Avro type that does not have a
logicalType
attribute. Language implementations may choose to represent logical types with an appropriate native type, although this is not required.
So, an uint
with a value superior to the signed int32
max value will be serialized as a negative number but deserialized into an uint
again if the deserializer supports it. If the deserializer does not support it, it would be deserialized as a signed int.
Also, to answer your question: int
is precisely defined by the spec to be a 32-bit signed integer, so yes, it makes sense to me. Even though zig-zag encoding could theoretically produce infinite integers, the spec limits them to 32 bits (int
) and 64 bits (long
).
A new property avro.dotnet.type
is a good idea to reduce future possible conflicts with the spec. But it breaks future interoperability between deserializers. (As does System.UInt32
break interoperability between the deserializers of different languages).
I get that logical types for strings can be a difficult topic, but I don't see how it applies to numbers, especially integers.
yes, it makes sense to me
Same, but what I’m getting at is that we don’t know how the spec will evolve. Maybe Chr.Avro introduces its own uint
logical type on top of "int"
tomorrow, but next month the spec introduces a uint
type that decorates "fixed"
schemas. Then we’d be sort of stuck; we could implement whatever the spec decides, but we couldn’t back out what we invented in the interim without breaking people.
But it breaks future interoperability between deserializers
Could you explain this more? avro.dotnet.type
would just be a hint to codegen.
Then we’d be sort of stuck
I'm sorry, I honestly don't understand why we would be stuck. We would have two ways to serdes uint
: one based on the underlying fixed
type and the other one based on the int
type. Even if the spec introduces a new avro type uint
, it wouldn't matter much (except the fact to have two different thing to achieve the same goal).
The only issue I see, is if the spec introduces a logical type uint
based on int
that wouldn't do the same thing as ours. But given the restriction on the logical type, I don't see how they could.
Could you explain this more? avro.dotnet.type would just be a hint to codegen.
What I meant is that using the logical property enables us to communicate intention within the spec's scope (In a context where the deserialization libray isn't the same as the serialization's one).
If we use the avro.dotnet.type
we ensures that nobody will support this.
We would have two ways to serdes
uint
: one based on the underlyingfixed
type and the other one based on theint
type.
If the spec says that uint
can only apply to fixed
, we're not spec-compliant.
If we use the
avro.dotnet.type
we ensures that nobody will support this.
Fundamentally, I think this is where the disagreement is—if a behavior should be supported by multiple Avro implementations, it should be introduced to the Avro spec and then implemented, not the other way around.
The avro specification allows the presence of the
logicalType
property which can be used to represent a derived type from the original type. It's already used forGuid
which is one of the possible logical types of the typestring
.I would love to see unsigned numbers supported in such fashion
For example, the following field:
would produce this code:
I've already forked this repo to do this for my job and would gladly submit a PR. (I've only done it for float and integer, but it could be generalized)
Please tell me what you think :)