Closed davydog187 closed 2 years ago
I'm having some questions about the spec with respect to how full names are resolved, and what is and isn't allowed in terms of duplicates.
Testing with avro-js, I get the following results
❌ Not allowed
> avro.parse({type: "record", name: "a", fields: [{name: "a", type: "string"}, {name: "a", type: "string"}]})
Uncaught Error: duplicate a field name
at new RecordType (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:1429:11)
at /Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:136:14
at Object.createType (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:137:7)
at Object.parse (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/files.js:74:13)
✅ Allowed
> avro.parse({type: "record", name: "a", fields: [{name: "a", type: "string"}, {name: "b", type: {type: "record", name: "inner", fields: [{name: "a", type: "string"}]}}]})
RecordType {
_name: 'a',
_aliases: [],
_type: 'record',
_fields: [
Field { _name: 'a', _type: StringType {}, _aliases: [], _order: 1 },
Field { _name: 'b', _type: [RecordType], _aliases: [], _order: 1 }
],
_constructor: [Function: a] { getType: [Function (anonymous)] },
_read: [Function: reada],
_skip: [Function: skipa],
_write: [Function: writea],
_check: [Function: checka]
}
However, it does not seem to respect namespaces on Record fields, which tells me I'm either misunderstanding the spec, or this implementation has issues.
The spec reads
In record, enum and fixed definitions, the fullname is determined in one of the following ways:
A name and namespace are both specified. For example, one might use "name": "X", "namespace": "org.foo" to indicate the fullname org.foo.X. A fullname is specified. If the name specified contains a dot, then it is assumed to be a fullname, and any namespace also specified is ignored. For example, use "name": "org.foo.X" to indicate the fullname org.foo.X. A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol. For example, if "name": "X" is specified, and this occurs within a field of the record definition of org.foo.Y, then the fullname is org.foo.X. If there is no enclosing namespace then the null namespace is used. References to previously defined names are as in the latter two cases above: if they contain a dot they are a fullname, if they do not contain a dot, the namespace is the namespace of the enclosing definition.
Primitive type names have no namespace and their names may not be defined in any namespace.
A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used ("before" in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come "before" the messages attribute.)
This leads me to believe the following assertions:
❌ Record fields should not contain duplicates ✅ Fields in different records can have the same name, only if they have a different namespace, and thus different fullname ✅ Namespaces are inherited from immediately enclosing definition ❌ Namespaces are not inherited from ancestors beyond parent
It seems that https://github.com/apache/avro/pull/1439 and https://github.com/apache/avro/pull/1573 seem to clarify the rules around namespace resolution. I'll dig into that more
Asking for feedback here https://github.com/apache/avro/pull/1439/files#r818203665
Opening this for review while I get over the final hurdle, namespace propagation. @doomspork if you want to give some review please do. Note that once the tests are passing and credo is happy, I will likely merge so I can move onto removing ecto and the rest of the cleanup, which will be very noisy
The goal of this PR is to
Closes #40 Closes #60
Closes #61 Closes #55
Schema features