Whether we pull from files in parallel or not is controlled by how we merge the batch streams in AsyncScanner::ScanBatchesUnorderedAsync. Currently we are relying on MakeConcatenatedGenerator which is incorrect. This is needed because MakeMergedGenerator pulls from its source (an EnumeratingGenerator) in an async reentrant fashion. MakeMergedGenerator should not do this. If some kind of readahead is truly necessary there then use MakeReadaheadGenerator.
Whether we pull from files in parallel or not is controlled by how we merge the batch streams in
AsyncScanner::ScanBatchesUnorderedAsync
. Currently we are relying onMakeConcatenatedGenerator
which is incorrect. This is needed becauseMakeMergedGenerator
pulls from its source (anEnumeratingGenerator
) in an async reentrant fashion.MakeMergedGenerator
should not do this. If some kind of readahead is truly necessary there then useMakeReadaheadGenerator
.Reporter: Weston Pace / @westonpace Assignee: Weston Pace / @westonpace
PRs and other links:
Note: This issue was originally created as ARROW-12386. Please see the migration documentation for further details.