Open comphead opened 1 week ago
This is a great idea. We have some work in datafusion-python
we might be able to reuse.
I did a quick proof of concept. Does this match what you're looking for?
#[tokio::main]
async fn main() -> Result<()> {
let batch = create_batch!(
("a", Int32, vec![1, 2, 3]),
("b", Float64, vec![Some(4.0), None, Some(5.0)]),
("c", Utf8, vec!["alpha", "beta", "gamma"])
)?;
let ctx = SessionContext::new();
ctx.read_batch(batch)?.show().await
}
Output is
+---+-----+-------+
| a | b | c |
+---+-----+-------+
| 1 | 4.0 | alpha |
| 2 | | beta |
| 3 | 5.0 | gamma |
+---+-----+-------+
I'm feeling we need to have something to create DF from rows in addition to creating DF from data files.
Currently DataFrames being created from logical plans or reading files. Having the API to create DataFrame from collections will make easier to play with test data and adding examples/documentation
Example can be
Underneath the method can call
ctx.read_batch(record_batch)
. The batch can be created withRecordBatch::try_from_iter
ortry_new
The very good start is in
dataframe_in_memory.rs
and it can be seen how many code needed just to create a dataframe on top of the schema and data, so idea to make a more concise APIOriginally posted by @comphead in https://github.com/apache/datafusion/issues/12564#issuecomment-2365265416