apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.29k stars 973 forks source link

Add example for building an external secondary index for parquet files #10549

Closed alamb closed 2 days ago

alamb commented 2 weeks ago

Note: While this PR looks very large (715 lines) around half of the content is comments / docstrings

Which issue does this PR close?

Closes https://github.com/apache/datafusion/issues/10546

Rationale for this change

See https://github.com/apache/datafusion/issues/10546

Building and using external indexes in DataFusion is an important feature. Adding an example of how to do so will help drive the design and APIs

What changes are included in this PR?

New Example

Are these changes tested?

CI

Are there any user-facing changes?

No -- just an example

TODOs

alamb commented 1 week ago

This PR is now ready for review

alamb commented 6 days ago

@crepererum and @NGA-TRAN -- here is a PR ready for your review that shows how to do file level pruning with statistics.

I will make an example of how to do row group level / page level pruning next

NGA-TRAN commented 4 days ago

I start reviewing this

alamb commented 2 days ago

Thank you very much for the review @crepererum