create table testcustomer(c_custkey integer, c_name varchar) with (format = 'PARQUET');
insert into testcustomer values (1, 'foobar');
alter table testcustomer drop column c_name;
select * from testcustomer;
Query 20241119_141459_00009_7pg22 failed: idx < names_.size() (1 vs. 1) Split [Hive: /testcustomer/20241119_141434_00007_7pg22.1.0.0.0_0_5_d05daa10-7aa4-445f-a4e9-f145083b1dfb.parquet 0 - 680] Task 20241119_141459_00009_7pg22.1.0.0.0 Operator: TableScan[0] 0
It was also hard to investigate this in production since standard exceptions are only caught in the TableScan operator. We need to add a standard exception check at the Parquet Parser level.
Bug description
https://github.com/facebookincubator/velox/pull/10517 broke schema evolution. Prestissimo reproducer.
It was also hard to investigate this in production since standard exceptions are only caught in the TableScan operator. We need to add a standard exception check at the Parquet Parser level.
The ParquetReader initializeSchema needs a refactor. There are too many branches and special cases. https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/reader/ParquetReader.cpp#L233
System information
Velox System Info v0.0.2 Commit: c069192e9bf079434906bb873d7dc03c621d2c18 CMake Version: 3.30.4 System: Darwin-23.6.0 Arch: arm64 C++ Compiler: /Library/Developer/CommandLineTools/usr/bin/c++ C++ Compiler Version: 15.0.0.15000309 C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc C Compiler Version: 15.0.0.15000309 CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/opt/homebrew/Cellar/cmake/3.30.4;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local
Relevant logs
No response