flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Support for parsing a list of schemas which may have cross dependencies #173

Closed batconjurer closed 3 years ago

batconjurer commented 3 years ago

The current library only allows the parsing of jsons that are completely specified in their definitions, i.e. it is not possible to specify two types, one of which depends on the other, and be able to parse this without first composing the jsons.

For example, there is currently no way to parse the following situation "out of the box"

{
 "name": "A",
 "type": "array",
 "items": "B"
},
{
  "name": "B",
  "type": "map",
  "values": "double"
}

This means that in real life, one must work with very deeply nested json strings which can be very unwieldy. Additionally, many users would naturally like to specify lots of types with cross dependencies on each other but do not want to do lots of json manipulation to be able to parse their types.

This also affect libraries downstream. I attempted a fix for this with rsgen-avro since I wanted this feature for code generation (necessary for my work). However, large amounts of logic necessary to do this replicated logic already present in this repository. Thus I agreed with the maintainer that this feature more naturally belonged here.

This adds a new parsing function Schema::parse_list which takes a list of json strings, which is allowed to have cross dependencies. An internal data structure Parser is created which does all the parsing and also maintains a map of parsed types to draw from as necessary. This data structure is dropped after parsing is finished so that the statelessness of the Schema type is maintained.

In the added tests, concrete examples can be found.

poros commented 3 years ago

Hello! As of today, I am stepping down as a maintainer of this library for a while (see #174). Please ping @flavray from now on. Thanks again for your contribution :)

batconjurer commented 3 years ago

This looks really solid to me, and great test suite, thank you for the great contribution! I've got a few comments and questions, but it should be mergeable really soon. slightly_smiling_face

This feature has been in my backlog for a loong time, albeit I had another design in mind (a schema store, where schemas would be added one by one and referenced, but I actually like your stateless/do-everything-at-once approach!). So, thank you again!

That's so much for your quick response and helpful review. I have tried to address your comments as best I can. The biggest question I still have involves only supporting name-able types. I think that makes sense, and I added to the doc-string something to that effect, but it is currently not enforced.

So the question is, do we return and error or simply discourage types other than name-ables, although in some cases it may work?

flavray commented 3 years ago

Would it also make sense to include a small example of how to use this in src/lib.rs (here)? This will then show up in the README and users will be able to find this more easily!

To update README.md once you've updated src/lib.rs, you can run:

cargo install cargo-readme
make readme
batconjurer commented 3 years ago

I have updated the readme and src/lib.rs with an example. Hopefully that wraps this PR up. Thanks again for the prompt reviews.

flavray commented 3 years ago

I've merged this, and will cut a new release tomorrow to have this available. πŸ™‚

flavray commented 3 years ago

v0.13.0 has just been released with this change. πŸ™‚

markfarnan commented 1 year ago

This is a great feature, however one key question.

Once this structure is parsed, how can I USE it in the rest of the functions which only seem to take 'schema'. rather than Vec ?

I've been reading the code of Avro-Rust and this lib for ages, but can't see how to use this result when calling Reader, Writer, or in my case, to_avro_datum

If I just one one of the Schemas from the array (the root schema), it throws SchemResolutionError as it can't find the child schema for the referenced type.

Any advice greatly appreciated.