StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.03k stars 1.82k forks source link

[Enhancement] Ignore union type tag when converting avro to json (backport #52973) #53091

Closed mergify[bot] closed 21 hours ago

mergify[bot] commented 22 hours ago

Why I'm doing:

schema:

 {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": ["null",
                                   {
                                       "type": "record",
                                       "name": "email2",
                                       "fields": [
                                           {
                                               "name": "x",
                                               "type" : ["null", "int"]
                                           },
                                           {
                                               "name": "y",
                                               "type": ["null", "string"]
                                           }
                                       ]
                                   }
                                  ]
         }
    ]
 }

avro avro_value_to_json result: {"id": 1, "name": "Alice", "email": {"email2": {"x": {"int": 1}, "y": {"string": "alice@example.com"}}}}

What I'm doing:

add a new function to convert avro values to JSON strings while ignoring union type tags. {"id":1,"name":"Alice","email":{"x":1,"y":"alice@example.com"}}

add a new config avro_ignore_union_type_tag and modify existing functions to use this new conversion method based on the config.

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

add a new function to convert avro values to JSON strings while ignoring union type tags. {"id":1,"name":"Alice","email":{"x":1,"y":"alice@example.com"}}

add a new config avro_ignore_union_type_tag and modify existing functions to use this new conversion method based on the config.

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

mergify[bot] commented 22 hours ago

Cherry-pick of d42eb660049d7cc94a1ec1da603102f6575c5567 has failed:

On branch mergify/bp/branch-3.3/pr-52973
Your branch is up to date with 'origin/branch-3.3'.

You are currently cherry-picking commit d42eb66004.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
    modified:   be/src/formats/avro/binary_column.cpp
    modified:   be/test/exec/avro_scanner_test.cpp
    modified:   be/test/exec/test_data/avro_scanner/avro_basic_schema.json

Unmerged paths:
  (use "git add <file>..." to mark resolution)
    both modified:   be/src/common/config.h

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify[bot] commented 22 hours ago

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr