datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.99k stars 2.96k forks source link

Kafka AVRO Schema Ingestion: Named types are not parsed completely #9886

Open vladimirivkovic opened 9 months ago

vladimirivkovic commented 9 months ago

Describe the bug A named type is used to specify the type for multiple fields in a record.

Sample AVRO schema:

{
  "namespace": "com.sample.avro.schema",
  "type": "record",
  "name": "Pyramid",
  "fields": [
    {
      "name": "a",
      "type": {
        "type": "record",
        "name": "Point3D",
        "fields": [
          { "name": "x", "type": "double" },
          { "name": "y", "type": "double" },
          { "name": "z", "type": "double" }
        ]
      }
    },
    { "name": "b", "type": "Point3D" },
    {
      "name": "c",
      "type": ["null", "Point3D"]
    },
    { "name": "d", "type": "Point3D" }
  ]
}

After running Kafka ingestion with Schema Registry enabled, the following schema is shown in DataHub: Screenshot from 2024-02-08 08-55-36

To Reproduce Steps to reproduce the behavior:

  1. Create a Kafka topic named pyramid with the AVRO schema specified above
  2. Run the UI ingestion
  3. Search for pyramid topic and open it
  4. See error in Schema tab

Expected behavior It is expected that fields b and d have the same structure as the field a - three sub-fields.

Desktop (please complete the following information):

Additional context The same issue is noticed for both UI ingestion and custom ingestion using Python SDK and the schema_util.avro_schema_to_mce_fields method.

For this case, it seems that commenting out the else statement here solves the problem.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

vladimirivkovic commented 8 months ago

The same behavior was noticed while using version 0.13.0.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

vladimirivkovic commented 7 months ago

The same behavior was noticed while using version 0.13.1.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

vladimirivkovic commented 5 months ago

The same behavior was noticed while using version 0.13.3.