apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.51k stars 3.53k forks source link

[C++] Fix issues with enumerated generator's last=true approach #28648

Open asfimport opened 3 years ago

asfimport commented 3 years ago

The current approach has two problems.  First, it forces the stream to buffer one which breaks cache coherency.  Second, it breaks the scanner as it is today.  An empty file would emit zero batches and the resequencer would see it as a skip in the fragment index and fail.

Reporter: Weston Pace / @westonpace

Note: This issue was originally created as ARROW-12920. Please see the migration documentation for further details.

asfimport commented 3 years ago

Weston Pace / @westonpace: Related Zulip discussion: https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/IterationTraits.3A.3AIsEnd.28.29