flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

Added `read_schema` #196

Open jorgecarleitao opened 3 years ago

jorgecarleitao commented 3 years ago

This PR moves the functionality to read the header of a block to outside of the Block struct, so that users can read the schema without having to initialize a Reader.

Users will need to re-seek the file if they want to then pass it to Reader, but the primary goal here is to offer users the ability to read the schema, e.g. to build logical plans based on the file.

Built on top of #194

Dandandan commented 3 years ago

@iemejia @jorgecarleitao

Has the official Rust implementation moved to this repo / space after the donation to Apache?

https://github.com/apache/avro/tree/master/lang/rust

If so, think it makes sense to update some pointers in this repository.

Dandandan commented 3 years ago

@jorgecarleitao

Some pointers:

https://github.com/flavray/avro-rs/issues/189 https://s.apache.org/avro-rust-vote

iemejia commented 3 years ago

For some extra context the idea we were looking for was to have a consolidated upstream Apache Avro Rust implementation, so we contacted @flavray and the Yelp authors to have the code donated and fortunately this happened succesfully. Migration to Apache Avro of the codebase already happened. However we have not done yet the first release so these changes can probably still get in. Some things did not happen as planned (due to multiple reasons and personal changes), we are lacking maintainers so any help will be welcomed. We expect to have a new Avro release that will include the rust Avro version 'soon' so if you do the changes there we can get them in.

Slightly unrelated but good for awareness: We expect the first release to be fully consistent with the avro-rs APIs but this might change in the future, The Materialize team has a forked implementation of this repo with many incompatible API changes but with some niceties like better Avro format support + faster encoding/decoding that they were willing to donate at some point, but sadly because of other priorities nobody has worked on moving that code into the Apache side, but well is the same issue we need to get more maintainers, sadly it is not only about putting the code in.

CC @RyanSkraba since we are syncing about the next release

Dandandan commented 3 years ago

Thanks @iemejia for the full context and status of the Rust implementation! Getting the Materialize version in seems great.

Also let me know when I should give access to avro-tools on crates.io.

@Igosuki you recently contributed the Avro table provider in DataFusion - maybe you're interested as well to help out on the Apache Avro side :D?

Igosuki commented 3 years ago

I didn't know this is being migrated to apache ! I have a fork where I added protocol support for schemas generated from idl here https://github.com/Igosuki/avro-rs/commit/9f51ffa29ab00f3889e7507ffdcde190223b7360 let me know if there's anything I can help with

jorgecarleitao commented 3 years ago

I can help with the maintenance. After https://github.com/jorgecarleitao/arrow2/pull/406 I am familiar with the format and how each type is encoded (and the code in avro-rs is very easy to follow, I must say =)

iemejia commented 3 years ago

I somehow missed the rest of the conversation. @Igosuki @jorgecarleitao would you be interested in taking the maintenance on the Apache side? It will be fantastic to have more hands helping, sadly things have not gone as expected since the move to Apache.

  • Is there an option to continue using github issues (instead of JIRA)? For JIRA it is going to be hard because of consistency with the rest of Avro.

could you not release under the same cadence of other implementations and/or format? Being on 0.X helps

For the release cadence this is up to the maintainers needs, at this point we plan to do the first release with 1.11.0 but then we could maybe go in with rust specific releases if required, the issue is the usual of the Apache 72h vote.

Also there are two things Avro does NOT follow semver and for the particular case of the Rust implementation we do not want to offer strong stability guarantees to help new contributors evolve the implementation.

I had a short discussion with @flavray last week and he seems to be interested on taking back some review tasks on the Apache branch so with a little bit of help we could get things again rolling.

If you are in for helping please write here or ping me at the ASF slack and we can discuss more.

Igosuki commented 3 years ago

I am interested on helping maintain the apache side. I don't know what you planned for future releases but I already have quite a few things in mind aiming to attain feature parity with Java.