Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
17 stars 2 forks source link

build_path doesn't work when date>2263 #217

Closed juliettelavoie closed 1 year ago

juliettelavoie commented 1 year ago

Setup Information

Description

I am trying to build a path for pr_day_KACE-1-0-G_piControl_r1i1p1f1_gr_24000101-24491230.nc and it doesn't work because of the date.

Steps To Reproduce

ds = xr.open_dataset('pr_day_KACE-1-0-G_piControl_r1i1p1f1_gr_24000101-24491230.nc')
ds.attrs['cat:type']='simulation'
ds.attrs['cat:processing_level']='raw'
xs.build_path(ds, variable='pr')

I get:

Traceback (most recent call last):
  File "/exec/jlavoie/.conda/xscen-dev/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-3aa62a471bfc>", line 4, in <cell line: 4>
    xs.build_path(ds, variable='pr')
  File "/home/jlavoie/Projets/xscen/xscen/config.py", line 215, in _wrapper
    return func(*args, **kwargs)
  File "/home/jlavoie/Projets/xscen/xscen/catutils.py", line 1099, in build_path
    return _build_path(data, schemas=schemas, root=root, **extra_facets)
  File "/home/jlavoie/Projets/xscen/xscen/catutils.py", line 1021, in _build_path
    out = out / _schema_filename(schema["filename"], facets)
  File "/home/jlavoie/Projets/xscen/xscen/catutils.py", line 923, in _schema_filename
    [
  File "/home/jlavoie/Projets/xscen/xscen/catutils.py", line 924, in <listcomp>
    facets.get(element) if element != "DATES" else _schema_dates(facets)
  File "/home/jlavoie/Projets/xscen/xscen/catutils.py", line 893, in _schema_dates
    start = date_parser(facets["date_start"], out_dtype="datetime")
  File "/home/jlavoie/Projets/xscen/xscen/utils.py", line 133, in date_parser
    return date.to_timestamp()
  File "pandas/_libs/tslibs/period.pyx", line 1872, in pandas._libs.tslibs.period._Period.to_timestamp
  File "pandas/_libs/tslibs/period.pyx", line 1150, in pandas._libs.tslibs.period.period_ordinal_to_dt64
  File "pandas/_libs/tslibs/np_datetime.pyx", line 218, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2400-01-01 12:00:00

Additional context

I thought we had fixed this with periods?

Also, I am using pandas 2.0.2. I think we can have units larger than nanoseconds. This might be a way to solve this.

Contribution

aulemahal commented 1 year ago

Dang, I didn't think of that!

I think we'll be able to pin pandas >= 2 and get rid of all this superfetatoire Period machinery. I'll do both in an upcoming PR.

juliettelavoie commented 1 year ago

A note regarding moving to pandas 2.0 for non-nanosecond units of time. It doesn't work in xarray yet. https://github.com/pydata/xarray/blob/799f12d756f9edf1a40188461cf3833a0788af82/xarray/core/variable.py#L72

RondeauG commented 1 year ago

Since we only use non-nanosecond units of time for catalog-related functions, I think this is fine?

juliettelavoie commented 1 year ago

I don't know enough about that part of the code, but I just ran into that issue in another project, so I thought I'd mention it.