context.sql("select * from test_table").thenComposeAsync(DataFrame::show).join();
As the result I got the following exception:
Exception in thread "main" java.util.concurrent.CompletionException: java.lang.RuntimeException: Arrow error: Parser error: Error while parsing value age for column 1 at line 0
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.lang.RuntimeException: Arrow error: Parser error: Error while parsing value age for column 1 at line 0
at org.apache.arrow.datafusion.DefaultDataFrame$RuntimeExceptionCallback.accept(DefaultDataFrame.java:127)
at org.apache.arrow.datafusion.DefaultDataFrame$RuntimeExceptionCallback.accept(DefaultDataFrame.java:117)
at org.apache.arrow.datafusion.DataFrames.showDataframe(Native Method)
at org.apache.arrow.datafusion.DefaultDataFrame.show(DefaultDataFrame.java:70)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
... 6 more
I have also implemented my own "show()" method:
private static void show(ArrowReader reader) {
try {
VectorSchemaRoot root = reader.getVectorSchemaRoot();
System.out.println(root.getSchema().getFields());
while (reader.loadNextBatch()) {
int n = root.getFieldVectors().size();
System.out.println(root.getFieldVectors().stream().map(v -> v.getField().getName() + ":" + v.getField().getFieldType().getType()).collect(Collectors.joining("|")));
int rows = root.getRowCount();
for (int r = 0; r < rows; r++) {
for (int i = 0; i < n; i++) {
FieldVector nameVector = root.getVector(i);
System.out.print(nameVector.getObject(r) + " | ");
}
System.out.println();
}
}
reader.close();
} catch (IOException e) {
logger.warn("got IO Exception", e);
}
}
and used it as following:
context
.sql("select * from test_table")
.thenComposeAsync(df -> df.collect(allocator))
.thenAccept(ExampleMain::show)
.join();
In this case the error message looks like this:
thread '<unnamed>' panicked at src/dataframe.rs:29:14:
failed to collect dataframe: ArrowError(ParseError("Error while parsing value age for column 1 at line 0"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Both examples work if CSV file does not have header or if age column is defined as VARCHAR. In this case the code works but it reads header as a first line of the data. Attempt to use formant.has_header instead of has_header does not help.
Note that the same scenario works correctly for me with datafusion-cli. It looks that the OPTIONS ('has_header' 'true') is just ignored when running with datafusion-java. It is strange because as far as I can see datafusion-java is just a thin JNI wrapper over the native datafusion API.
I am running on Ubunty and using java 21 (if it matters).
I tried to run a simple example with CSV file that has headers.
So, I have created external table as following:
... and then executed query:
As the result I got the following exception:
I have also implemented my own "show()" method:
and used it as following:
In this case the error message looks like this:
Both examples work if CSV file does not have header or if
age
column is defined asVARCHAR
. In this case the code works but it reads header as a first line of the data. Attempt to useformant.has_header
instead ofhas_header
does not help.Note that the same scenario works correctly for me with
datafusion-cli
. It looks that theOPTIONS ('has_header' 'true')
is just ignored when running with datafusion-java. It is strange because as far as I can see datafusion-java is just a thin JNI wrapper over the native datafusion API.I am running on Ubunty and using java 21 (if it matters).