Because our current Parquet decoding logic reuses mutable buffers, we have to be careful to perform a deep copy before calling operators that could cache data. We do this by wrapping ScanExec in a CopyExec.
However, we use ScanExec both for reading from Parquet scans and also for reading from exchanges (broadcast and shuffle). We only need to perform deep copies in the Parquet case.
What changes are included in this PR?
Use CopyMode::UnpackOrClone instead of CopyMode::UnpackOrDeepCopy when wrapping a CometScan that is reading from an exchange.
Which issue does this PR close?
N/A
Rationale for this change
Because our current Parquet decoding logic reuses mutable buffers, we have to be careful to perform a deep copy before calling operators that could cache data. We do this by wrapping ScanExec in a CopyExec.
However, we use ScanExec both for reading from Parquet scans and also for reading from exchanges (broadcast and shuffle). We only need to perform deep copies in the Parquet case.
What changes are included in this PR?
Use
CopyMode::UnpackOrClone
instead ofCopyMode::UnpackOrDeepCopy
when wrapping aCometScan
that is reading from an exchange.How are these changes tested?
Existing tests.