apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
386 stars 98 forks source link

go/adbc/driver/bigquery: list of potential enhancements, missing features and missing docs #1974

Open cocoa-xu opened 5 months ago

cocoa-xu commented 5 months ago

What feature or improvement would you like to see?

This todo list contains a number of potential enhancements, missing features and missing docs that I collected from the preliminary implementation of the BigQuery Go driver https://github.com/apache/arrow-adbc/pull/1722. Some of these may be implemented together in a single PR, while some others may need a bit more work to achieve.

Also, some of these features could be hard/impossible to implement with current BigQuery API or due to some other limitations in BigQuery.

bigquery.QueryConfig

When doing a query, a *bigquery.Query is needed and it contains the query and the query options/configurations. Some of these configurable items (in bigquery.QueryConfig) are not implemented yet:

Parameters is implemented but I forgot to remove it from the mini todo list in the statement.go.

ADBC callbacks/features

Missing Types

From BigQuery to Arrow

From Arrow to BigQuery

Docs

lidavidm commented 5 months ago

For DURATION, I think in SQLite we store an ISO duration string. Same with INTERVAL. That avoids weird mappings to/from DATETIME.

RUN_END_ENCODED would probably get expanded to the underlying type. Same with DICTIONARY. They're both more like "layouts" and not "types", but Arrow doesn't really differentiate on this axis (unlike say Parquet)

cocoa-xu commented 5 months ago

Thanks for the suggestions! ISO duration string sounds great for avoiding weird mappings! I'll test and see what is returned from BigQuery when the field is an interval type.

And I agree with you on RUN_END_ENCODED and DICTIONARY. I'll try to implement these ones and create a PR!

ukclivecox commented 4 months ago

Great work. It would be useful to allow the bigquery client to be more customizable so for example gcloud testcontainers could be used if possible?