Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
476 stars 59 forks source link

Update TableScan.select to select data columns, not manifest columns. #95

Closed rdblue closed 5 years ago

rdblue commented 5 years ago

@Parth-Brahmbhatt and @danielcweeks, FYI.

This PR updates TableScan so that the select method selects data columns instead of manifest columns. Selecting manifest columns is confusing and caused us to duplicate the list of manifest columns in all of the readers.

Using data columns instead doesn't simplify the readers much because they still need to do engine-specific projection tasks. For example, projection in Spark is done using a Spark struct type, not columns. But this does make the projection available for us to log.