Open rouault opened 1 week ago
@theroggy @jorisvandenbossche I'm thinking that in this DATETIME_AS_STRING=YES mode, in the ArrowSchema of datetime fields exposed as string (format='u'), we should probably also set the metadata field with a hint for the DateTime semantics. Any suggestion of an appropriate value for it?
Thanks a lot for looking into this!
we should probably also set the metadata field with a hint for the DateTime semantics. Any suggestion of an appropriate value for it?
Would you just want to indicate that the original GDAL/OGR type was a DateTime? Or is there more information about the column that GDAL can know at that point?
For the type, maybe something like "gdal:type": "DateTime"
? (there is not yet any precedence where you store some information like this is any file format?)
Would you just want to indicate that the original GDAL/OGR type was a DateTime?
actually, I'm just remembering that we have already something. https://gdal.org/en/latest/doxygen/classOGRLayer.html#a3ffa8511632cbb7cff06a908e6668f55 mentions:
Starting with GDAL 3.8, the ArrowSchema::metadata field filled by the get_schema() callback may be set with the potential following items:
"GDAL:OGR:alternative_name": value of OGRFieldDefn::GetAlternativeNameRef()
"GDAL:OGR:comment": value of OGRFieldDefn::GetComment()
"GDAL:OGR:default": value of OGRFieldDefn::GetDefault()
"GDAL:OGR:subtype": value of OGRFieldDefn::GetSubType()
"GDAL:OGR:width": value of OGRFieldDefn::GetWidth() (serialized as a string)
"GDAL:OGR:unique": value of OGRFieldDefn::IsUnique() (serialized as "true" or "false")
"GDAL:OGR:domain_name": value of OGRFieldDefn::GetDomainName()
Those are only filled when they cannot be expressed with an Arrow concept. So logically that should be extended with "GDAL:OGR:type": "DateTime" in that situation
Fixes https://github.com/geopandas/pyogrio/issues/487
OGRLayer::GetArrowStream(): when DATETIME_AS_STRING=YES, expose "GDAL:OGR:type":"DateTime" metadata in the ArrowSchema of DateTime fields
CreateFieldFromArrowSchema(): take into account GDAL:OGR:Type=DataTime when ArrowSchema.format='u' (string)
ogr2ogr: GPKG/FlatGeoBuf -> other format: in Arrow code path, use DATETIME_AS_STRING to preserve origin timezone
Fixes #11212