apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
746 stars 144 forks source link

Support Connection through Arrow Flight RPC / ADBC #913

Open v-kessler opened 1 week ago

v-kessler commented 1 week ago

What is the problem the feature request solves?

Rationale

The Arrow ecosystem lacks standard database interfaces built around Arrow data, especially for efficiently fetching large datasets (i.e. with minimal or no serialization and copying). Without a common API, the end result is a mix of custom protocols (e.g. BigQuery, Snowflake) and adapters (e.g. Turbodbc) scattered across languages. Consumers must laboriously wrap individual systems (as DBI is contemplating and Trino does with connectors).

ADBC aims to provide a minimal database client API standard, based on Arrow, for C, Go, and Java (with bindings for other languages). Applications code to this API standard (in much the same way as they would with JDBC or ODBC), but fetch result sets in Arrow format (e.g. via the C Data Interface). They then link to an implementation of the standard: either directly to a vendor-supplied driver for a particular database, or to a driver manager that abstracts across multiple drivers. Drivers implement the standard using a database-specific API, such as Flight SQL.

Goals

Describe the potential solution

The implementation could be done in 3 steps approach

  1. Arrow Flight RPC https://arrow.apache.org/docs/format/Flight.html
  2. Arrow Flight SQL https://arrow.apache.org/docs/format/FlightSql.html
  3. ADBC https://arrow.apache.org/docs/format/ADBC.html

Additional context

No response

v-kessler commented 1 week ago

@vaibhawvipul as discussed here is the issue

vaibhawvipul commented 1 week ago

@vaibhawvipul as discussed here is the issue

Thank you.