Open charlie430 opened 2 years ago
It's not implemented at the moment but feel free to send a PR for the API: Arrow Interface
Would it be ok to use Apache.Arrow package as a dependency?
It has data types for some objects, like CArrowSchema for duckdb_arrow_schema
Would that be added to the Binding project or Data project?
Probably Binding project, but the package might be too heavy for it.
I guess for now I can create a DuckDBArrowSchema
with private fields that has compatible struct layout.
So consumers can just cast the pointer to CArrowSchema*
.
Honestly, I haven't looked much into Arrow and can't tell now for sure. Feel free to join DuckDB Discord, we can discuss it in more detail in the dotnet channel.
An alternative for getting data as Arrow could be to use the C# ADBC implementation with the generic driver importer. This code is not very mature yet, but you can run queries and get the result back as Arrow. Example:
using AdbcDriver duckdb = CAdbcDriverImporter.Load("D:\\testdata\\duckdb.dll", "duckdb_adbc_init");
using AdbcDatabase db = duckdb.Open(new Dictionary<string, string> { { "path", "d:/testdata/ddbt.db"} });
using AdbcConnection cn = db.Connect(null);
using AdbcStatement stmt = cn.CreateStatement();
stmt.SqlQuery = "CREATE TABLE integers(foo INTEGER, bar INTEGER);";
stmt.ExecuteUpdate();
stmt.SqlQuery = "INSERT INTO integers VALUES (3, 4), (5, 6), (7, 8);";
stmt.ExecuteUpdate();
stmt.SqlQuery = "SELECT * from integers";
var results = stmt.ExecuteQuery();
// results.Stream is an IArrowArrayStream, which lets you get the schema
// and a set of record batches
NOTE that this code is not super mature and we haven't yet reached a 1.0 release.
Nice! I think it would be great if there was a way to go from DuckDBConnection
(provided by this library) to AdbcConnection
. I can expose the underlying pointer to the database (obtained by duckdb_open
and duckdb_connect
) but looks like there is no way to convert such pointer to an AdbcConnection
object.
I know next to nothing about DuckDB internals, so I have no idea how plausible something like this is. For ADBC, we need an array of function pointers that defines the ADBC driver API -- this is what duckdb_adbc_init is initializing -- and the connection is then roughly an indirected opaque pointer that gets passed to some of these function pointers.
I'm very interested in Apache Arrow being supported for the in-memory scenario.
Is there any information you can provide on when that might be supported?
Thanks!