StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.08k stars 1.82k forks source link

[Enhancement] Ignore union type tag when converting avro to json #52973

Closed wyb closed 1 day ago

wyb commented 4 days ago

Why I'm doing:

schema:

 {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "email", "type": ["null",
                                   {
                                       "type": "record",
                                       "name": "email2",
                                       "fields": [
                                           {
                                               "name": "x",
                                               "type" : ["null", "int"]
                                           },
                                           {
                                               "name": "y",
                                               "type": ["null", "string"]
                                           }
                                       ]
                                   }
                                  ]
         }
    ]
 }

avro avro_value_to_json result: {"id": 1, "name": "Alice", "email": {"email2": {"x": {"int": 1}, "y": {"string": "alice@example.com"}}}}

What I'm doing:

add a new function to convert avro values to JSON strings while ignoring union type tags. {"id":1,"name":"Alice","email":{"x":1,"y":"alice@example.com"}}

add a new config avro_ignore_union_type_tag and modify existing functions to use this new conversion method based on the config.

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

github-actions[bot] commented 3 days ago

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] commented 3 days ago

[FE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] commented 3 days ago

[BE Incremental Coverage Report]

:x: fail : 122 / 168 (72.62%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: be/src/formats/avro/binary_column.cpp 122 168 72.62% [213, 218, 219, 220, 221, 222, 224, 225, 230, 238, 243, 244, 245, 246, 248, 249, 254, 262, 274, 282, 297, 306, 307, 308, 309, 310, 312, 313, 318, 326, 340, 347, 359, 364, 365, 423, 424, 431, 432, 433, 487, 488, 495, 496, 497, 505]
github-actions[bot] commented 1 day ago

@Mergifyio backport branch-3.3

mergify[bot] commented 1 day ago

backport branch-3.3

✅ Backports have been created

* [#53091 [Enhancement] Ignore union type tag when converting avro to json (backport #52973)](https://github.com/StarRocks/starrocks/pull/53091) has been created for branch `branch-3.3` but encountered conflicts
wyb commented 1 day ago

https://github.com/Mergifyio backport branch-3.4

mergify[bot] commented 1 day ago

backport branch-3.4

🛑 Command backport branch-3.4 cancelled because of a new backport command with different arguments

wyb commented 1 day ago

https://github.com/Mergifyio backport branch-3.4

mergify[bot] commented 1 day ago

backport branch-3.4

✅ Backports have been created

* [#53100 [Enhancement] Ignore union type tag when converting avro to json (backport #52973)](https://github.com/StarRocks/starrocks/pull/53100) has been created for branch `branch-3.4`