Closed YuvalItzchakov closed 5 years ago
@YuvalItzchakov, I think the main goal of this PR is to take a mapping and an Avro schema without IDs and produce something that can be passed into buildAvroProjection
. That method takes a file schema, an Iceberg read schema, and a set of renames.
The work here produces an Iceberg schema with IDs from an Avro schema, but that's not exactly what is needed to be passed into buildAvroProjection
. So I think this should actually produce an Avro schema with the right IDs from a field name to ID mapping.
@rdblue Thank you for taking the time to review everything! Frankly, I was not sure this code was in the right direction at all, because as you've mentioned, SchemaToType
does exactly the work which is done here (I learned that after I wrote the code).
I'll start working on the changes.
@YuvalItzchakov, if you want to continue working on this, please re-open it in the apache/incubator-iceberg repository. That's the project's new home. Thanks!
Initial fix for https://github.com/Netflix/iceberg/issues/71
This implementation introduces
AvroNamedSchemaVisitor<T>
and theAvroToIcebergSchemaVisitor<iceberg.Schema>
types in order to overcome the fact thatAvroSchemaVisitor
only passes the schema at the field level, neglecting to pass in the field names which are needed generate the schema.There are a few "workarounds" in the implementation of the
record
method which I don't really like, still need to iterate on that.There are still open issues which need to be resolved:
0
, and they also map structs in a table with ids (is this desired?)Record
is tested, need to add tests forMap
,Union
,Array
, etc..)