apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.13k stars 409 forks source link

[VL] Scan split proto contains duplicate field #6405

Open Yohahaha opened 1 month ago

Yohahaha commented 1 month ago

Description

{
            "uriFile": "oss://xxx/tpcds/partitioned/tpcds_sf100/web_sales/ws_sold_date_sk=2451145/part-00086-5f7cfd3c-f467-4d40-82e3-d2033487db84.c000.snappy.parquet",
            "partitionIndex": "5",
            "length": "5483534",
            "parquet": {},
            "partitionColumns": [
                {
                    "key": "ws_sold_date_sk",
                    "value": "2451145"
                }
            ],
            "schema": {},
            "metadataColumns": [
                {
                    "key": "input_file_name",
                    "value": "oss://xxx/tpcds/partitioned/tpcds_sf100/web_sales/ws_sold_date_sk=2451145/part-00086-5f7cfd3c-f467-4d40-82e3-d2033487db84.c000.snappy.parquet"
                },
                {
                    "key": "input_file_block_length",
                    "value": "5483534"
                },
                {
                    "key": "input_file_block_start",
                    "value": "0"
                }
            ],
            "properties": {
                "fileSize": "5483534",
                "modificationTime": "1719801698242"
            }
        }

some fields are duplicate, should be merged.

@gaoyangxiaozhu @acvictor

acvictor commented 1 month ago

I will take a look, thanks!