apache / arrow-java

Official Java implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
4 stars 5 forks source link

[Java][Docs] Document the use of the batchSize argument in Dataset ScanOptions #303

Open asfimport opened 2 years ago

asfimport commented 2 years ago

Several ScanOptions methods take a batchSize argument as shown: 

public ScanOptions(long batchSize) {     this(batchSize, Optional.empty()); ``}

Since the scanner reads one ArrowRecordBatch per load invocation, setting the parameter to a size larger than the RecordBatch has no effect. It only works when it's smaller than the number of rows in the RecordBatch, (i.e., the number or records read is equal to min(batchSize, recordBatch rowCount), potentially leading to some confusion. 

Reporter: Larry White / @lwhite1

Note: This issue was originally created as ARROW-17346. Please see the migration documentation for further details.

asfimport commented 2 years ago

Apache Arrow JIRA Bot: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.