-
## Usecase
Many analytic systems store their data with some particular sort order, and the query engine can often take advantage of this sort order to both reduce memory usage and performance
Spec…
alamb updated
1 month ago
-
## Goal-State/What/Result
Iceberg data in local and s3 bucket can be federated-query-able using Spice with comparable performance to other tools (Dremio), and the data can be accelerated locally.
…
-
### What is the problem the feature request solves?
I added some debug logging to `CometNativeIterator` to show the size of batches being processed when running TPC-H q14 and I see lots of small batc…
-
### Is your feature request related to a problem or challenge?
We need the ability to get the `TaskContext.task_id` any place where a Custom Data Source is invoked. As it stands currently, the `state…
-
**Describe the bug**
There are currently 31 open PRs for dependabot:
https://github.com/apache/datafusion-python/pulls/app%2Fdependabot
We should either merge them, close them or configure dependab…
-
### Describe the bug
While investigating #10709, I tried using datafusion CLI to require parquet files to a better size.
But I got a panic:
```
thread 'tokio-runtime-worker' panicked at /Users…
-
### Is your feature request related to a problem or challenge?
Good day. Are there any plans to support real-time streaming of Arrow record batches? The use case I imagine would be that we could setu…
-
Great idea @ClSlaid
Thanks to @AbrarNitk we have a first version of `LogicalPlanBuilder::from(arc_input)` in https://github.com/apache/datafusion/pull/10466 🙏
Now that we have me…
alamb updated
1 month ago
-
ORC spec: https://orc.apache.org/docs/types.html
> Hive always uses a struct with a field for each of the top-level columns as the root object type, but that is not required
See https://github.c…
-
Remove:
```rs
let batches = df.collect().await?;
```
from `datafusion-optd-cli/src/exec.rs`
because it will internally run `datafusion`'s logical optimizer. We should try to call the …