I am using the duckdb-rs crate to query an embedded duckdb and return the results as arrow record batches. I would like string types to be returned as arrow StringView types for more efficient memory usage, but it seems the produce_arrow_string_view setting has no effect on the returned data type.
To Reproduce
The issue can be reproduced with the following rust program. I did not see any type conversions in the rust wrapper itself, so I assume the issue in in the duckdb core.
[package]
name = "duckdb-arrow-test"
version = "0.1.0"
edition = "2021"
[dependencies]
duckdb = { version = "1.0.0", features = ["bundled"] }
use duckdb::Connection;
fn main() {
let conn = Connection::open_in_memory().unwrap();
let setup_script = r"
SET arrow_output_list_view = true;
SET produce_arrow_string_view = true;
";
conn.execute_batch(&setup_script).unwrap();
let mut query = conn
.prepare("SELECT (i*10^i)::varchar AS str FROM range(5) tbl(i)")
.unwrap();
let arrow = query.query_arrow([]).unwrap();
for batch in arrow {
dbg!(batch.schema().field(0).data_type());
dbg!(batch.column(0));
}
}
What happens?
I am using the duckdb-rs crate to query an embedded duckdb and return the results as arrow record batches. I would like string types to be returned as arrow StringView types for more efficient memory usage, but it seems the
produce_arrow_string_view
setting has no effect on the returned data type.To Reproduce
The issue can be reproduced with the following rust program. I did not see any type conversions in the rust wrapper itself, so I assume the issue in in the duckdb core.
Output:
The
arrow_large_buffer_size
setting correctly changes the data type toLargeUtf8
instead ofUtf8
.OS:
x86_64 linux ubuntu
DuckDB Version:
1.1.1
DuckDB Client:
rust (duckdb-rs)
Hardware:
No response
Full Name:
Jörn Horstmann
Affiliation:
SAP SE
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?