Closed m-richards closed 8 months ago
I checked with use_arrow=True
, and this is actually also not supported for reading: we always get naive wall-time timestamps as results (i.e. the same as we have now before this PR). Opened an issue on the GDAL side about this: https://github.com/OSGeo/gdal/issues/8460
Apologies for the very slow review here; I am slowly trying to get up to speed and looking for other alternatives to adding dtype info to the meta
.
Thanks @m-richards for all your work on this!
Thanks for the detailed review, I hope I have addressed most of the suggestions now.
It might be a good idea to add some notes about this to
introduction.md
to guide users about when to usedatetime_as_string
and otherwise how the returned values will be dependent on the version of Pandas used (if I follow correctly).
Happy to do this, but just want to double check the structuring. I am not super familiar with the pyogrio documentation, but introduction.md
seems to all describe using read_dataframe
. At present this pull request doesn't expose datetime_as_string
to read_dataframe
, only in raw.py::read
so I don't know if it makes sense to document / whether it should be documented elsewhere?
On return types, happy to add this, but I saw there is already somewhat of a discussion of this at known_issues.md
- which I now realise I need to update as part of this pull request too. Happy to have an example in the introduction as well though if you think that makes sense.
At present this pull request doesn't expose
datetime_as_string
toread_dataframe
, only inraw.py::read
so I don't know if it makes sense to document / whether it should be documented elsewhere?
I would personally say it's good to keep this raw
only, and then having the keyword explained in the docstrings seems good enough to me.
And the section in the "known_issues" about the datetime gotchas is already a nice and quite complete section I think. Maybe we could just move it "introduction", as now it's actually no longer a known "issue" with pyogrio, but general limitations of reading/writing datetimes because of GDAL's data model.
Apologies for the poor direction. Leaving datetime_as_string
at raw
level only is fine. I'd suggest moving the datimetime section from known issues to the introduction.
Thanks for the updates @m-richards ! Getting really close. Looks like there are a few uncommitted suggestions, but after those are in this looks good to me.
I think they're all done now, don't know how I missed them before.
Thanks @m-richards ! 🚀
This is an initial proof of concept into reading and writing datetimes with timezones by reading and writing strings from gdal, and parsing the strings on the pandas side, very directly following discussion and suggestions in #123 123.
I'm trying not to affect the numpy
read
api, as a this functionality doesn't make sense from the numpy layer, as it doesn't support timezones.There is something missing from this, the numpy api should still be able to read datetimes with timezones and convert them correctly to UTC (currently timezones are ignored complete), but I think that can be a separate PR (perhaps it should even come before this).
There's also probably a fair bit of work to handle driver support for this, I noticed issues locally trying to write geopackages (using WSL with an old version of gdal). I had a quick look at the tests in fiona, I was hoping for a simple list of supported formats, but it's quite complex there). I figure though that's a secondary issue after figuring out the implementation.