Open cocoa-xu opened 5 months ago
For DURATION, I think in SQLite we store an ISO duration string. Same with INTERVAL. That avoids weird mappings to/from DATETIME.
RUN_END_ENCODED would probably get expanded to the underlying type. Same with DICTIONARY. They're both more like "layouts" and not "types", but Arrow doesn't really differentiate on this axis (unlike say Parquet)
Thanks for the suggestions! ISO duration string sounds great for avoiding weird mappings! I'll test and see what is returned from BigQuery when the field is an interval type.
And I agree with you on RUN_END_ENCODED and DICTIONARY. I'll try to implement these ones and create a PR!
Great work. It would be useful to allow the bigquery client to be more customizable so for example gcloud testcontainers could be used if possible?
What feature or improvement would you like to see?
This todo list contains a number of potential enhancements, missing features and missing docs that I collected from the preliminary implementation of the BigQuery Go driver https://github.com/apache/arrow-adbc/pull/1722. Some of these may be implemented together in a single PR, while some others may need a bit more work to achieve.
Also, some of these features could be hard/impossible to implement with current BigQuery API or due to some other limitations in BigQuery.
bigquery.QueryConfig
When doing a query, a
*bigquery.Query
is needed and it contains the query and the query options/configurations. Some of these configurable items (inbigquery.QueryConfig
) are not implemented yet:Parameters
is implemented but I forgot to remove it from the mini todo list in thestatement.go
.ADBC callbacks/features
ExecuteSchema
ReadPartition
ExecutePartitions
GetInfo
,GetTableSchema
and other functions for BigQuery's AdbcConnection and AdbcStatementdriverbase.DbObjectsEnumerator
Missing Types
From BigQuery to Arrow
bigquery.IntervalFieldType
bigquery.RangeFieldType
bigquery.GeographyFieldType
, this one is returned as strings for now. But we can potentially consider using GeoArrow for this.From Arrow to BigQuery
arrow.DURATION
, I'm not sure which SQL DataType would be a good representation for it.DATETIME
could be a potential one for it if we count from0001-01-01T00:00:00.000000Z
arrow.INTERVAL_MONTHS
arrow.INTERVAL_DAY_TIME
arrow.INTERVAL_MONTH_DAY_NANO
,DATETIME
could be a potential fit for all interval types, but the issue is there're no rules on how many days should be in a month.arrow.RUN_END_ENCODED
arrow.SPARSE_UNION
arrow.DENSE_UNION
arrow.DICTIONARY
arrow.MAP
Docs