dasch / avro_turf

A library that makes it easier to use the Avro serialization format from Ruby.
MIT License
167 stars 80 forks source link

AvroTurf::SchemaStore do not follow Avro specification when loading nested schemas #186

Closed piotaixr closed 1 year ago

piotaixr commented 1 year ago

The avro specification states in https://avro.apache.org/docs/1.10.2/spec.html#names:

In record, enum and fixed definitions, the fullname is determined in one of the following ways:

- A name and namespace are both specified. For example, one might use "name": "X", "namespace": "org.foo" to indicate the fullname org.foo.X.
- A fullname is specified. If the name specified contains a dot, then it is assumed to be a fullname, and any namespace also specified is ignored. For example, use "name": "org.foo.X" to indicate the fullname org.foo.X.
- A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol. For example, if "name": "X" is specified, and this occurs within a field of the record definition of org.foo.Y, then the fullname is org.foo.X. If there is no enclosing namespace then the null namespace is used.
References to previously defined names are as in the latter two cases above: if they contain a dot they are a fullname, if they do not contain a dot, the namespace is the namespace of the enclosing definition.

This means that, if we have the following schema file:

foo/bar.avsc

{
  "type": "array",
  "namespace": "foo",
  "name": "bar",
  "items": "another_schema"
}

... then, the another_schema schema MUST be in the foo namespace as A name only is specified, i.e., a name that contains no dots. In this case the namespace is taken from the most tightly enclosing schema or protocol

Currently, the AvroTurf::SchemaStore will try to load another_schema.avsc in the null namespace (at the root of the provided path) instead of in the foofolder.

I did not find a workaround for this as:

The only solution is to explicitely provide the namespace for every nested schema reference that we expect the schema store to load, even if the specification allows us not to.

dasch commented 1 year ago

I guess there’s nothing to do then?

piotaixr commented 1 year ago

There is. I just opened https://issues.apache.org/jira/browse/AVRO-3790

piotaixr commented 1 year ago

PR opened in the Avro repo: https://github.com/apache/avro/pull/2409

github-actions[bot] commented 1 year ago

Stale issue message

piotaixr commented 1 year ago

Avro 1.11.3 has been released, I can now create a PR to fix this issue!