flavray / avro-rs

Avro client library implementation in Rust
MIT License
169 stars 95 forks source link

org.apache.avro.SchemaParseException: Can't redefine: test #182

Open lockwobr opened 3 years ago

lockwobr commented 3 years ago

Having issues writing data with avro_rs and reading it with apache avro java. I was able to create one example that is close to what i am experiencing. I have a pretty complicated schema, so trying to boil it down the problem bits.

This code works just fine, but went read into avro tools i get an error.

use avro_rs::{Codec, Reader, Schema, Writer, from_value, types::Record, Error};
use serde::{Deserialize, Serialize};
use std;

#[derive(Debug, Deserialize, Serialize)]
struct Test {
    a: i64,
    b: String,
    test: Test2,
}

#[derive(Debug, Deserialize, Serialize)]
struct Test2 {
    a: i64,
    b: String,
}

fn main() -> Result<(), Error> {
    let raw_schema = r#"
        {
            "type": "record",
            "name": "test",
            "fields": [
                {"name": "a", "type": "long", "default": 42},
                {"name": "b", "type": "string"},
                {"name": "test", "type": {
                    "type": "record",
                    "name": "test",
                    "fields": [
                        {"name": "a", "type": "long", "default": 42},
                        {"name": "b", "type": "string"}
                    ]
                }}
            ]
        }
    "#;

    let schema = Schema::parse_str(raw_schema)?;

    // println!("{:?}", schema);

    let mut writer = Writer::new(&schema, std::io::stdout());

    let test = Test {
        a: 27,
        b: "foo".to_owned(),
        test: Test2 {
            a: 23,
            b: "bar".to_owned(),
        }
    };

    writer.append_ser(test)?;
    writer.flush()?;

    Ok(())
}
❯./target/debug/example > avro.out
❯ java -jar ~/bin/avro-tools-1.10.1.jar tojson avro.out
21/02/17 10:38:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: test
        at org.apache.avro.Schema$Names.put(Schema.java:1542)
        at org.apache.avro.Schema$Names.add(Schema.java:1536)
        at org.apache.avro.Schema.parse(Schema.java:1655)
        at org.apache.avro.Schema.parse(Schema.java:1668)
        at org.apache.avro.Schema$Parser.parse(Schema.java:1425)
        at org.apache.avro.Schema$Parser.parse(Schema.java:1413)
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:131)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:90)
        at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:93)
        at org.apache.avro.tool.Main.run(Main.java:67)
        at org.apache.avro.tool.Main.main(Main.java:56)

Seems like there might be a validation that apache avro is doing that avro_rs is not. How I found this error is using the parse_list or load a directory of schema files. I have a record type that is used more that once in a parent record type and because it in lines the child schemas in the record I get an error this like the one above. In apache avro when it inlines the child schemas in parent it only defines the child record type once and then uses it by name the subsequent times. In my example, this is sort of the same issues, the record type name is the same "test" and avro_rs is ok with that, but apache avro is not.

martin-g commented 2 years ago

This issue is fixed in apache_avro crate. There one can use schema references:

{
            "type": "record",
            "name": "test",
            "fields": [
                {"name": "a", "type": "long", "default": 42},
                {"name": "b", "type": "string"},
                {"name": "test", "type": "test"}
            ]
        }

apache_avro is a fork/donation of this project to Apache Avro project. There is no official release of the crate yet but it should be released soon with Avro 1.11.1!

travisbrown commented 2 years ago

@martin-g Thanks for the pointer! There still seems to be an issue with can't refine errors, at least in some non-recursive cases. Take the following example:

fn main() {
    let schema = r#"
    {
      "name": "test.test",
      "type": "record",
      "fields": [
        {
          "name": "bar",
          "type": { "name": "test.foo", "type": "record", "fields": [{ "name": "id", "type": "long" }] }
        },
        { "name": "baz", "type": "test.foo" }
      ]
    }
    "#;

    let schema = apache_avro::schema::Schema::parse_str(&schema).unwrap();

    println!("{}", serde_json::to_string(&schema).unwrap());
}

This prints the following (the same thing happens if the test.foo definition is in a separate file):

$ target/release/avro-test | jq
{
  "type": "record",
  "name": "test.test",
  "fields": [
    {
      "name": "bar",
      "type": {
        "type": "record",
        "name": "test.foo",
        "fields": [
          {
            "name": "id",
            "type": "long"
          }
        ]
      }
    },
    {
      "name": "baz",
      "type": {
        "type": "record",
        "name": "test.foo",
        "fields": [
          {
            "name": "id",
            "type": "long"
          }
        ]
      }
    }
  ]
}

Which will cause the Java tooling to fail with the org.apache.avro.SchemaParseException: Can't redefine: test error above.

martin-g commented 2 years ago

@travisbrown I've logged https://issues.apache.org/jira/browse/AVRO-3433 for the issue!

travisbrown commented 2 years ago

@martin-g Thanks very much!

martin-g commented 2 years ago

@travisbrown Please try https://github.com/apache/avro/tree/avro-3433-preserve-schema-ref-in-json There are some tests to update but the issue should be fixed!

martin-g commented 2 years ago

https://github.com/apache/avro/pull/1580 is ready for review!