This PR updates TableScan so that the select method selects data columns instead of manifest columns. Selecting manifest columns is confusing and caused us to duplicate the list of manifest columns in all of the readers.
Using data columns instead doesn't simplify the readers much because they still need to do engine-specific projection tasks. For example, projection in Spark is done using a Spark struct type, not columns. But this does make the projection available for us to log.
@Parth-Brahmbhatt and @danielcweeks, FYI.
This PR updates TableScan so that the
select
method selects data columns instead of manifest columns. Selecting manifest columns is confusing and caused us to duplicate the list of manifest columns in all of the readers.Using data columns instead doesn't simplify the readers much because they still need to do engine-specific projection tasks. For example, projection in Spark is done using a Spark struct type, not columns. But this does make the projection available for us to log.