apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
381 stars 94 forks source link

[Format] Improve partitioned data interface #68

Closed lidavidm closed 2 years ago

lidavidm commented 2 years ago

We should improve the documentation/justification for this interface, describe better what happens when it's not supported, and make sure it lines up with what potential users of the interface expect.

In particular, it should line up with Spark's DataSourceV2. Looking at ReadSupport, the main thing is that we need to return the schema and partitions at the same time. So we might want to return something like this:

struct AdbcPartitions {
  struct ArrowSchema result_schema;
  size_t num_partitions;
  uint8_t** partitions;
  void* private_data;
};
AdbcStatusCode AdbcPartitionsRelease(struct AdbcPartitions*, struct AdbcError*);

Also, should deserializing a partition descriptor give you a statement, or just directly give you a result reader?

Also see #61 which proposes refactoring the Execute API.

lidavidm commented 2 years ago

Most of the work was done in #61 so I'll use this to fix up the Python side.