facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.41k stars 1.12k forks source link

Issue while reading complex type data column in Parquet data #10658

Open agrawalreetika opened 1 month ago

agrawalreetika commented 1 month ago

Bug description

When Map data type column has more columns than in table Schema than file Schema then Query is failing on Prestissimo cluster but running fine and adding NULL to extra columns in case of Java cluster run.

Steps To Reproduce - Execution Engine - Presto Data Format - PARQUET

presto> create table map_test(p1 int, m1 map<varchar, row(a varchar, b varchar)>) with (format='parquet');
presto> insert into map_test values(100, MAP(ARRAY['0', '1'], ARRAY[ROW('Alice', '30'), ROW('Bob', '25')]));

presto> presto:reetika_testdb> select * from map_test;
 p1  |                  m1                  
-----+--------------------------------------
 100 | {0={a=Alice, b=30}, 1={a=Bob, b=25}} 
(1 row)

presto> alter table map_test drop column m1;
DROP COLUMN
presto> alter table map_test add column m1 map<varchar, row(a varchar, b varchar, c varchar)>;
ADD COLUMN

From Java Cluster -

presto> set session hive.parquet_use_column_names=true;
presto> select * from map_test;
 p1  |                          m1                          
-----+------------------------------------------------------
 100 | {0={a=Alice, b=30, c=null}, 1={a=Bob, b=25, c=null}} 
(1 row)

presto> set session hive.parquet_use_column_names=false;
presto> select * from map_test;
 p1  |                          m1                          
-----+------------------------------------------------------
 100 | {0={a=Alice, b=30, c=null}, 1={a=Bob, b=25, c=null}} 
(1 row)

From Prestissimo Cluster -

presto> select * from map_test;

VeloxUserError:  Field not found: c. Available fields are: a, b. Split [Hive: s3a://reetika-testdb/map_test/20240805_085335_00004_jmf3r_3ecee9de-d6fb-4029-ab2c-d89aea6cd948 0 - 907] Task 20240805_121401_00009_c23rm.1.0.0.0

System information

Mac OS

Relevant logs

No response

FelixYBW commented 1 month ago

@yma11