Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

Dates as datetime64[ms] - remove `driving_institution` #222

Closed aulemahal closed 1 year ago

aulemahal commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

Pandas 2 now supports datetime columns with a s, ms and us resolution, instead of the old ns default. This allows storing dates from before 1677 and after 2242. However, this support is still partial as many of the datetime manipulation methods will still fail on "out of bounds" dates. This includes: pd.read_csv and pd.to_datetime... Because of this bug, I had to implement the parsing directly in the DataCatalog's init, using a solution proposed on stackoverflow.

Even with this strange workaround, opening simulation.json went from 3 s to 800 ms on my machine !

The change had repercussions in other parts of xscen, especially date_parser and subset_file_coverage. I adapted the former to output pd.Timestamp objects by default and the latter to use more of the Interval magic pandas can already do with datetime bounds.

I also used this PR to remove driving_institution from the official columns, as discussed.

Does this PR introduce a breaking change?

The default output of date_parser has changed.

The default dtype of date_start and date_end has changed.

The driving_institution column has been removed.

Other information:

This required pinning pandas >= 2, clisops >= 0.10. The latter pin allowed unpinning python.

aulemahal commented 1 year ago

Feel free to push changes and merge this whenever you may want to. I'll be on vacation for the next two weeks.