facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.42k stars 1.12k forks source link

Row Index Column support for Parquet Scan #9165

Open gaoyangxiaozhu opened 5 months ago

gaoyangxiaozhu commented 5 months ago

Description

Spark support people query row_index metadata column of parquet file. Checking this spark part implement -https://github.com/apache/spark/commit/95aebcbf100de1dbedd32626ce67bd01014c973e

however, velox doesn't support row_index metadata, below spark query would return null for row index column

select a, _tmp_metadata_row_index from table;

The issue use to track support for query row_index metadata column for parquet scan which is also a feature ask from Micrisoft Delta team

gaoyangxiaozhu commented 5 months ago

@majetideepak / @aditi-pandit for FYI. And @zhli1142015

I'd like take work for this part. And already have a PR in our internal forked reposotiry ready, will send the PR later

aditi-pandit commented 3 months ago

@gaoyangxiaozhu : Your PR in Velox is merged. Is this issue completed then ? Please close the issue if yes.