Open vinayvenk opened 1 month ago
Hi @vinayvenk, what Arrow Java version are you on? Can you share the code you ran, ideally with code that can generate the data causing this?
GM @amoeba it is pretty much standard code that i got from the example and I tried using 16.0.1, 16.0.0 and 15.0.1
String uri = parquetFile.toURI().toString();
ScanOptions options = new ScanOptions(/ batchSize / 32768);
try (BufferAllocator allocator = new RootAllocator();
DatasetFactory datasetFactory = new FileSystemDatasetFactory(allocator, NativeMemoryPool.getDefault(),
FileFormat.PARQUET, uri);
Dataset dataset = datasetFactory.finish();
Scanner scanner = dataset.newScan(options);
ArrowReader reader = scanner.scanBatches()) {
int batchCount = 0;
while (reader.loadNextBatch()) {
try (VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
//function to create csv data
createCSV()
}
it fails when it tries to load the second batch.
Describe the bug, including details regarding any error messages, version, and platform.
Getting this exception when trying to load next batch while reading a parquet file. The parsing works if the batch size is big enough to process all the parquet contents in one shot. But if I try to give a smaller batch size , the code breaks giving the below exception
java.lang.IllegalArgumentException: should have as many children as in the schema: found 0 expected 8 at org.apache.arrow.util.Preconditions.checkArgument(Preconditions.java:282) at org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:127) at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:84) at org.apache.arrow.c.Data.importIntoVectorSchemaRoot(Data.java:334) at org.apache.arrow.dataset.jni.NativeScanner$NativeReader.loadNextBatch(NativeScanner.java:151)
Component(s)
Java