Closed kevinjqliu closed 1 month ago
@kevinjqliu I would like to work on this one.
sure @soumya-ghosh, assigned to you
The solution might look similar to what is already done for project_batches
in #1042
https://github.com/apache/iceberg-python/blob/f05b1aedee8451d981188adf68be5e8b360a9ca1/pyiceberg/io/pyarrow.py#L1457-L1479
Closed by #1043 (see comment)
Feature Request / Improvement
As of now,
limit
is checked only after an entire parquet file is read. https://github.com/apache/iceberg-python/blob/d8b5c17cadbc99e53d08ade6109283ee73f0d83e/pyiceberg/io/pyarrow.py#L1360-L1390Optimization to pushdown limit to the parquet reading level
For more details, see this comment