When enabling AQE, we introduce intermediate materializations for a query with multiple shuffles.
The problem with this is that metadata is not preserved across materialization boundaries.
So if we are running a SortMergeJoin and we draw a boundary after the sort and before the join, the algorithm errors out because the boundaries value is not set on the MaterializedResult.
This happens because at the .collect() point, we place Micropartitions into the cache rather than the MaterializedResult which contains both the data and PartitionMetadata.
We already do this behavior for the ray runner, this PR formalizes it for all runners.
boundaries
value is not set on theMaterializedResult
..collect()
point, we placeMicropartitions
into the cache rather than theMaterializedResult
which contains both the data andPartitionMetadata
.We already do this behavior for the ray runner, this PR formalizes it for all runners.