Open CurtHagenlocher opened 7 hours ago
For the record arrow can currently store timezone offsets per array as strings see here. To store per value offsets an extension type sounds like a good idea. What temporal resolution would you propose? Minutes would fit into two bytes I suppose.
The standard seems to be support for minute-level resolution with a range of something like -14:00 to +14:00. Storing the number of minutes as an int16 seems right.
Adding an extension type would start by opening a PR against CanonicalExtensions.rst describing the proposed type and calling for discussion/vote on the ML (e.g. 8-bit boolean). It might make sense to wait for more people to chime in before doing so though.
Describe the enhancement requested
Relational databases Snowflake, MSSQL, Oracle, Teradata, and SAP SQL Anywhere all support a data type which stores both a timestamp and a time zone offset. This differs from the existing Arrow timestamp type by letting each individual value in the column have a different offset and by not being tied to a geopolitical time zone. This type also appears in Java as
OffsetDateTime
and in .NET asDateTimeOffset
. It would be nice given how commonly it appears if there were a standard way to represent this in Arrow.This could be done as an extension type for a structure consisting of separate 8-byte timestamp and 2-byte offset values, or as a new first-class type. Intervals are a structure with some similarity to this type and were done as a first-class type, but they also predate the extension type mechanism.
Component(s)
Format