deercreeklabs / lancaster

Apache Avro library for Clojure and ClojureScript
Other
60 stars 5 forks source link

Repeated Namespaced Enums break serialization #7

Closed yanatan16 closed 3 years ago

yanatan16 commented 3 years ago

Given the following JSON schema:

{
  "type" : "record",
  "name" : "Test",
  "namespace" : "avro.me",
  "fields" : [ {
    "name" : "x",
    "type" : [ "null", {
      "type" : "enum",
      "name" : "FooEnum",
      "namespace": "com.company",
      "symbols" : [ "FOO_BAR", "FOO_BAZ" ]
    } ],
    "default" : null
  }, {
    "name" : "y",
    "type" : [ "null", "com.company.FooEnum" ],
    "default" : null
  } ]
}

Serializing with Lancaster breaks on the second use of the namespace schema.

Reproduce:

user> (slurp "testschema.json")
"{\n  \"type\" : \"record\",\n  \"name\" : \"Test\",\n  \"namespace\" : \"avro.me\",\n  \"fields\" : [ {\n    \"name\" : \"x\",\n    \"type\" : [ \"null\", {\n      \"type\" : \"enum\",\n      \"name\" : \"FooEnum\",\n      \"namespace\": \"com.company\",\n      \"symbols\" : [ \"FOO_BAR\", \"FOO_BAZ\" ]\n    } ],\n    \"default\" : null\n  }, {\n    \"name\" : \"y\",\n    \"type\" : [ \"null\", \"com.company.FooEnum\" ],\n    \"default\" : null\n  } ]\n}\n"
user> (def schema (l/json->schema *1))
#'user/schema
user> (l/serialize schema {:x :foo-bar :y :foo-bar})
Execution error (ExceptionInfo) at deercreeklabs.lancaster.utils/eval14268$fn$serialize (utils.cljc:912).
Data `:foo-bar` does not match any schema in the union schema. Path: [:y]

If you remove the namespaces, this test passes fine. I believe this error comes about because the name->* maps don't add the namespaced names (:com.company/foo-enum) of the schema when the schema is parsed, they just add the unnamespaced names (:foo-enum), which breaks the union schema serializer.

yanatan16 commented 3 years ago

Its working now. I can't reproduce this again. I'll resubmit if I can.

chadharrington commented 3 years ago

@yanatan16 Thanks for taking the time to write this up. I recently made some changes that may have fixed this. Did you perhaps update the Lancaster version?

As a side note, what is your main use case for Lancaster?

Let me know if I can be of any assistance or if you run into any other issues.

yanatan16 commented 3 years ago

Hey @chadharrington, I'm still investigating the issue. Looks like I closed it prematurely. It is still happening for me on latest master branch. I added a test and I'll take a look at fixing it and send a PR if I can.

As for my use case, I work at a large tech company on a data engineering team. We have avro data (defined by other teams) on Kafka and a schema registry. I write Flink ETL to read avro schemas from the registry and parse the data on Kafka. Mostly I do this for developer agility use cases. These avro schemas are somewhat complicated with lots of namespaces and reused types. I like to use Clojure, so I use Lancaster (and its json->schema function) heavily.

yanatan16 commented 3 years ago

I pushed a couple test cases that fail to a branch on my fork: https://github.com/yanatan16/lancaster/tree/yanatan16/namespaced-json-schemas

One is this use case here. The other one is a failing to de/serialize namespaced records that are used twice. I already know a fix for this that I'll submit in a PR.

chadharrington commented 3 years ago

@yanatan16 I have merged your excellent PR #8 and pushed version 0.9.6 to Clojars. Thanks for your help. Closing this now.

yanatan16 commented 3 years ago

@chadharrington Thanks for merging quickly. I love using lancaster, thanks for your work on it!

chadharrington commented 3 years ago

@yanatan16 If there are features or enhancements that you'd like to see, feel free to open an issue to discuss.